{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QMjwq6pS-kFz"
      },
      "source": [
        "# Stock NeurIPS2018 Part 2. Train\n",
        "This series is a reproduction of *the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading*. \n",
        "\n",
        "This is the second part of the NeurIPS2018 series, introducing how to use FinRL to make data into the gym form environment, and train DRL agents on it.\n",
        "\n",
        "Other demos can be found at the repo of [FinRL-Tutorials]((https://github.com/AI4Finance-Foundation/FinRL-Tutorials))."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gT-zXutMgqOS"
      },
      "source": [
        "# Part 1. Install Packages"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 47,
      "metadata": {
        "id": "D0vEcPxSJ8hI"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: swig in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (4.1.1)\n",
            "Requirement already satisfied: wrds in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (3.1.6)\n",
            "Requirement already satisfied: numpy in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from wrds) (1.26.0)\n",
            "Requirement already satisfied: pandas in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from wrds) (2.1.1)\n",
            "Requirement already satisfied: psycopg2-binary in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from wrds) (2.9.9)\n",
            "Requirement already satisfied: scipy in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from wrds) (1.11.3)\n",
            "Requirement already satisfied: sqlalchemy<2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from wrds) (1.4.49)\n",
            "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pandas->wrds) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pandas->wrds) (2023.3.post1)\n",
            "Requirement already satisfied: tzdata>=2022.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pandas->wrds) (2023.3)\n",
            "Requirement already satisfied: six>=1.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->wrds) (1.16.0)\n",
            "Requirement already satisfied: pyportfolioopt in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (1.5.5)\n",
            "Requirement already satisfied: cvxpy<2.0.0,>=1.1.19 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyportfolioopt) (1.4.1)\n",
            "Requirement already satisfied: numpy<2.0.0,>=1.22.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyportfolioopt) (1.26.0)\n",
            "Requirement already satisfied: pandas>=0.19 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyportfolioopt) (2.1.1)\n",
            "Requirement already satisfied: scipy<2.0,>=1.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyportfolioopt) (1.11.3)\n",
            "Requirement already satisfied: osqp>=0.6.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt) (0.6.3)\n",
            "Requirement already satisfied: ecos>=2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt) (2.0.12)\n",
            "Requirement already satisfied: clarabel>=0.5.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt) (0.6.0)\n",
            "Requirement already satisfied: scs>=3.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt) (3.2.3)\n",
            "Requirement already satisfied: pybind11 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt) (2.11.1)\n",
            "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pandas>=0.19->pyportfolioopt) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pandas>=0.19->pyportfolioopt) (2023.3.post1)\n",
            "Requirement already satisfied: tzdata>=2022.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pandas>=0.19->pyportfolioopt) (2023.3)\n",
            "Requirement already satisfied: qdldl in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from osqp>=0.6.2->cvxpy<2.0.0,>=1.1.19->pyportfolioopt) (0.1.7.post0)\n",
            "Requirement already satisfied: six>=1.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas>=0.19->pyportfolioopt) (1.16.0)\n",
            "Collecting git+https://github.com/AI4Finance-Foundation/FinRL.git\n",
            "  Cloning https://github.com/AI4Finance-Foundation/FinRL.git to /private/var/folders/pc/8l9dz1f949ddzztd4yfcytx80000gn/T/pip-req-build-hnbhwp3e\n",
            "  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/FinRL.git /private/var/folders/pc/8l9dz1f949ddzztd4yfcytx80000gn/T/pip-req-build-hnbhwp3e\n",
            "  Resolved https://github.com/AI4Finance-Foundation/FinRL.git to commit 7c71056dd6d72e205096696319a2d8bd4a2bfe23\n",
            "  Installing build dependencies ... \u001b[?25ldone\n",
            "\u001b[?25h  Getting requirements to build wheel ... \u001b[?25ldone\n",
            "\u001b[?25h  Preparing metadata (pyproject.toml) ... \u001b[?25ldone\n",
            "\u001b[?25hCollecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl (from finrl==0.3.6)\n",
            "  Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to /private/var/folders/pc/8l9dz1f949ddzztd4yfcytx80000gn/T/pip-install-yvrctbtl/elegantrl_f6e392a5a7f4448ca1a4f166b992b1bd\n",
            "  Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/ElegantRL.git /private/var/folders/pc/8l9dz1f949ddzztd4yfcytx80000gn/T/pip-install-yvrctbtl/elegantrl_f6e392a5a7f4448ca1a4f166b992b1bd\n",
            "  Resolved https://github.com/AI4Finance-Foundation/ElegantRL.git to commit a68515548417093006eb7f68738b55a3a758645e\n",
            "  Preparing metadata (setup.py) ... \u001b[?25ldone\n",
            "\u001b[?25hRequirement already satisfied: alpaca-trade-api<4,>=3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (3.0.2)\n",
            "Requirement already satisfied: ccxt<4,>=3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (3.1.60)\n",
            "Requirement already satisfied: exchange-calendars<5,>=4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (4.5)\n",
            "Requirement already satisfied: jqdatasdk<2,>=1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (1.9.1)\n",
            "Requirement already satisfied: pyfolio<0.10,>=0.9 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (0.9.2)\n",
            "Requirement already satisfied: pyportfolioopt<2,>=1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (1.5.5)\n",
            "Requirement already satisfied: ray[default,tune]<3,>=2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (2.7.1)\n",
            "Requirement already satisfied: scikit-learn<2,>=1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (1.3.1)\n",
            "Requirement already satisfied: stable-baselines3[extra]>=2.0.0a5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (2.1.0)\n",
            "Requirement already satisfied: stockstats<0.6,>=0.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (0.5.4)\n",
            "Requirement already satisfied: wrds<4,>=3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (3.1.6)\n",
            "Requirement already satisfied: yfinance<0.3,>=0.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from finrl==0.3.6) (0.2.31)\n",
            "Requirement already satisfied: pandas>=0.18.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (2.1.1)\n",
            "Requirement already satisfied: numpy>=1.11.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (1.26.0)\n",
            "Requirement already satisfied: requests<3,>2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (2.31.0)\n",
            "Requirement already satisfied: urllib3<2,>1.24 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (1.26.17)\n",
            "Requirement already satisfied: websocket-client<2,>=0.56.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (1.6.4)\n",
            "Requirement already satisfied: websockets<11,>=9.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (10.4)\n",
            "Requirement already satisfied: msgpack==1.0.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (1.0.3)\n",
            "Requirement already satisfied: aiohttp==3.8.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (3.8.2)\n",
            "Requirement already satisfied: PyYAML==6.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (6.0)\n",
            "Requirement already satisfied: deprecation==2.1.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from alpaca-trade-api<4,>=3->finrl==0.3.6) (2.1.0)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiohttp==3.8.2->alpaca-trade-api<4,>=3->finrl==0.3.6) (23.1.0)\n",
            "Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiohttp==3.8.2->alpaca-trade-api<4,>=3->finrl==0.3.6) (2.1.1)\n",
            "Requirement already satisfied: multidict<6.0,>=4.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiohttp==3.8.2->alpaca-trade-api<4,>=3->finrl==0.3.6) (5.2.0)\n",
            "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiohttp==3.8.2->alpaca-trade-api<4,>=3->finrl==0.3.6) (4.0.3)\n",
            "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiohttp==3.8.2->alpaca-trade-api<4,>=3->finrl==0.3.6) (1.9.2)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiohttp==3.8.2->alpaca-trade-api<4,>=3->finrl==0.3.6) (1.4.0)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiohttp==3.8.2->alpaca-trade-api<4,>=3->finrl==0.3.6) (1.3.1)\n",
            "Requirement already satisfied: packaging in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from deprecation==2.1.0->alpaca-trade-api<4,>=3->finrl==0.3.6) (23.2)\n",
            "Requirement already satisfied: setuptools>=60.9.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ccxt<4,>=3->finrl==0.3.6) (68.0.0)\n",
            "Requirement already satisfied: certifi>=2018.1.18 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ccxt<4,>=3->finrl==0.3.6) (2023.7.22)\n",
            "Requirement already satisfied: cryptography>=2.6.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ccxt<4,>=3->finrl==0.3.6) (41.0.4)\n",
            "Requirement already satisfied: aiodns>=1.1.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ccxt<4,>=3->finrl==0.3.6) (3.1.0)\n",
            "Requirement already satisfied: pyluach in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from exchange-calendars<5,>=4->finrl==0.3.6) (2.2.0)\n",
            "Requirement already satisfied: python-dateutil in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from exchange-calendars<5,>=4->finrl==0.3.6) (2.8.2)\n",
            "Requirement already satisfied: toolz in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from exchange-calendars<5,>=4->finrl==0.3.6) (0.12.0)\n",
            "Requirement already satisfied: tzdata in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from exchange-calendars<5,>=4->finrl==0.3.6) (2023.3)\n",
            "Requirement already satisfied: korean-lunar-calendar in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from exchange-calendars<5,>=4->finrl==0.3.6) (0.3.1)\n",
            "Requirement already satisfied: six in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jqdatasdk<2,>=1->finrl==0.3.6) (1.16.0)\n",
            "Requirement already satisfied: SQLAlchemy>=1.2.8 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jqdatasdk<2,>=1->finrl==0.3.6) (1.4.49)\n",
            "Requirement already satisfied: thriftpy2>=0.3.9 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jqdatasdk<2,>=1->finrl==0.3.6) (0.4.17)\n",
            "Requirement already satisfied: pymysql>=0.7.6 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jqdatasdk<2,>=1->finrl==0.3.6) (1.1.0)\n",
            "Requirement already satisfied: ipython>=3.2.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyfolio<0.10,>=0.9->finrl==0.3.6) (8.16.1)\n",
            "Requirement already satisfied: matplotlib>=1.4.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyfolio<0.10,>=0.9->finrl==0.3.6) (3.8.0)\n",
            "Requirement already satisfied: pytz>=2014.10 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyfolio<0.10,>=0.9->finrl==0.3.6) (2023.3.post1)\n",
            "Requirement already satisfied: scipy>=0.14.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyfolio<0.10,>=0.9->finrl==0.3.6) (1.11.3)\n",
            "Requirement already satisfied: seaborn>=0.7.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyfolio<0.10,>=0.9->finrl==0.3.6) (0.13.0)\n",
            "Requirement already satisfied: empyrical>=0.5.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyfolio<0.10,>=0.9->finrl==0.3.6) (0.5.5)\n",
            "Requirement already satisfied: cvxpy<2.0.0,>=1.1.19 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyportfolioopt<2,>=1->finrl==0.3.6) (1.4.1)\n",
            "Requirement already satisfied: click>=7.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (8.1.7)\n",
            "Requirement already satisfied: filelock in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (3.12.4)\n",
            "Requirement already satisfied: jsonschema in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (4.19.1)\n",
            "Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (4.24.4)\n",
            "Requirement already satisfied: aiohttp-cors in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (0.7.0)\n",
            "Requirement already satisfied: colorful in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (0.5.5)\n",
            "Requirement already satisfied: py-spy>=0.2.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (0.3.14)\n",
            "Requirement already satisfied: gpustat>=1.0.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (1.1.1)\n",
            "Requirement already satisfied: opencensus in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (0.11.3)\n",
            "Requirement already satisfied: pydantic<2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (1.10.13)\n",
            "Requirement already satisfied: prometheus-client>=0.7.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (0.17.1)\n",
            "Requirement already satisfied: smart-open in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (6.4.0)\n",
            "Requirement already satisfied: virtualenv<20.21.1,>=20.0.24 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (20.21.0)\n",
            "Requirement already satisfied: grpcio>=1.42.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (1.59.0)\n",
            "Requirement already satisfied: tensorboardX>=1.9 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (2.6.2.2)\n",
            "Requirement already satisfied: pyarrow>=6.0.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (13.0.0)\n",
            "Requirement already satisfied: fsspec in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ray[default,tune]<3,>=2->finrl==0.3.6) (2023.9.2)\n",
            "Requirement already satisfied: joblib>=1.1.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from scikit-learn<2,>=1->finrl==0.3.6) (1.3.2)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from scikit-learn<2,>=1->finrl==0.3.6) (3.2.0)\n",
            "Requirement already satisfied: gymnasium<0.30,>=0.28.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.29.1)\n",
            "Requirement already satisfied: torch>=1.13 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (2.1.0)\n",
            "Requirement already satisfied: cloudpickle in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (2.2.1)\n",
            "Requirement already satisfied: opencv-python in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (4.8.1.78)\n",
            "Requirement already satisfied: pygame in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (2.1.0)\n",
            "Requirement already satisfied: tensorboard>=2.9.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (2.14.1)\n",
            "Requirement already satisfied: psutil in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (5.9.0)\n",
            "Requirement already satisfied: tqdm in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (4.66.1)\n",
            "Requirement already satisfied: rich in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (13.6.0)\n",
            "Requirement already satisfied: shimmy[atari]~=1.1.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (1.1.0)\n",
            "Requirement already satisfied: pillow in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (10.0.1)\n",
            "Requirement already satisfied: autorom[accept-rom-license]~=0.6.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.6.1)\n",
            "Requirement already satisfied: psycopg2-binary in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from wrds<4,>=3->finrl==0.3.6) (2.9.9)\n",
            "Requirement already satisfied: multitasking>=0.0.7 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from yfinance<0.3,>=0.2->finrl==0.3.6) (0.0.11)\n",
            "Requirement already satisfied: lxml>=4.9.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from yfinance<0.3,>=0.2->finrl==0.3.6) (4.9.3)\n",
            "Requirement already satisfied: appdirs>=1.4.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from yfinance<0.3,>=0.2->finrl==0.3.6) (1.4.4)\n",
            "Requirement already satisfied: frozendict>=2.3.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from yfinance<0.3,>=0.2->finrl==0.3.6) (2.3.8)\n",
            "Requirement already satisfied: peewee>=3.16.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from yfinance<0.3,>=0.2->finrl==0.3.6) (3.17.0)\n",
            "Requirement already satisfied: beautifulsoup4>=4.11.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from yfinance<0.3,>=0.2->finrl==0.3.6) (4.12.2)\n",
            "Requirement already satisfied: html5lib>=1.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from yfinance<0.3,>=0.2->finrl==0.3.6) (1.1)\n",
            "Requirement already satisfied: gym in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl->finrl==0.3.6) (0.26.2)\n",
            "Requirement already satisfied: pycares>=4.0.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from aiodns>=1.1.1->ccxt<4,>=3->finrl==0.3.6) (4.4.0)\n",
            "Requirement already satisfied: AutoROM.accept-rom-license in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from autorom[accept-rom-license]~=0.6.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.6.1)\n",
            "Requirement already satisfied: soupsieve>1.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from beautifulsoup4>=4.11.1->yfinance<0.3,>=0.2->finrl==0.3.6) (2.5)\n",
            "Requirement already satisfied: cffi>=1.12 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cryptography>=2.6.1->ccxt<4,>=3->finrl==0.3.6) (1.16.0)\n",
            "Requirement already satisfied: osqp>=0.6.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt<2,>=1->finrl==0.3.6) (0.6.3)\n",
            "Requirement already satisfied: ecos>=2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt<2,>=1->finrl==0.3.6) (2.0.12)\n",
            "Requirement already satisfied: clarabel>=0.5.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt<2,>=1->finrl==0.3.6) (0.6.0)\n",
            "Requirement already satisfied: scs>=3.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt<2,>=1->finrl==0.3.6) (3.2.3)\n",
            "Requirement already satisfied: pybind11 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cvxpy<2.0.0,>=1.1.19->pyportfolioopt<2,>=1->finrl==0.3.6) (2.11.1)\n",
            "Requirement already satisfied: pandas-datareader>=0.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from empyrical>=0.5.0->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.10.0)\n",
            "Requirement already satisfied: nvidia-ml-py>=11.450.129 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from gpustat>=1.0.0->ray[default,tune]<3,>=2->finrl==0.3.6) (12.535.108)\n",
            "Requirement already satisfied: blessed>=1.17.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from gpustat>=1.0.0->ray[default,tune]<3,>=2->finrl==0.3.6) (1.20.0)\n",
            "Requirement already satisfied: typing-extensions>=4.3.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (4.8.0)\n",
            "Requirement already satisfied: farama-notifications>=0.0.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from gymnasium<0.30,>=0.28.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.0.4)\n",
            "Requirement already satisfied: webencodings in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from html5lib>=1.1->yfinance<0.3,>=0.2->finrl==0.3.6) (0.5.1)\n",
            "Requirement already satisfied: backcall in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.2.0)\n",
            "Requirement already satisfied: decorator in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (5.1.1)\n",
            "Requirement already satisfied: jedi>=0.16 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.19.1)\n",
            "Requirement already satisfied: matplotlib-inline in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.1.6)\n",
            "Requirement already satisfied: pickleshare in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.7.5)\n",
            "Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (3.0.39)\n",
            "Requirement already satisfied: pygments>=2.4.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (2.16.1)\n",
            "Requirement already satisfied: stack-data in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.6.2)\n",
            "Requirement already satisfied: traitlets>=5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (5.11.2)\n",
            "Requirement already satisfied: exceptiongroup in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (1.1.3)\n",
            "Requirement already satisfied: pexpect>4.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (4.8.0)\n",
            "Requirement already satisfied: appnope in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.1.3)\n",
            "Requirement already satisfied: contourpy>=1.0.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from matplotlib>=1.4.0->pyfolio<0.10,>=0.9->finrl==0.3.6) (1.1.1)\n",
            "Requirement already satisfied: cycler>=0.10 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from matplotlib>=1.4.0->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.12.1)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from matplotlib>=1.4.0->pyfolio<0.10,>=0.9->finrl==0.3.6) (4.43.1)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from matplotlib>=1.4.0->pyfolio<0.10,>=0.9->finrl==0.3.6) (1.4.5)\n",
            "Requirement already satisfied: pyparsing>=2.3.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from matplotlib>=1.4.0->pyfolio<0.10,>=0.9->finrl==0.3.6) (3.1.1)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from requests<3,>2->alpaca-trade-api<4,>=3->finrl==0.3.6) (3.4)\n",
            "Requirement already satisfied: ale-py~=0.8.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from shimmy[atari]~=1.1.0->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.8.1)\n",
            "Requirement already satisfied: absl-py>=0.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (2.0.0)\n",
            "Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (2.23.3)\n",
            "Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (1.0.0)\n",
            "Requirement already satisfied: markdown>=2.6.8 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (3.5)\n",
            "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.7.1)\n",
            "Requirement already satisfied: werkzeug>=1.0.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (3.0.0)\n",
            "Requirement already satisfied: ply<4.0,>=3.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from thriftpy2>=0.3.9->jqdatasdk<2,>=1->finrl==0.3.6) (3.11)\n",
            "Requirement already satisfied: sympy in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from torch>=1.13->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (1.12)\n",
            "Requirement already satisfied: networkx in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from torch>=1.13->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (3.1)\n",
            "Requirement already satisfied: jinja2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from torch>=1.13->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (3.1.2)\n",
            "Requirement already satisfied: distlib<1,>=0.3.6 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from virtualenv<20.21.1,>=20.0.24->ray[default,tune]<3,>=2->finrl==0.3.6) (0.3.7)\n",
            "Requirement already satisfied: platformdirs<4,>=2.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from virtualenv<20.21.1,>=20.0.24->ray[default,tune]<3,>=2->finrl==0.3.6) (3.11.0)\n",
            "Requirement already satisfied: gym-notices>=0.0.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from gym->elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl->finrl==0.3.6) (0.0.8)\n",
            "Requirement already satisfied: box2d-py==2.3.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from gym->elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl->finrl==0.3.6) (2.3.5)\n",
            "Requirement already satisfied: swig==4.* in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from gym->elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git#egg=elegantrl->finrl==0.3.6) (4.1.1)\n",
            "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jsonschema->ray[default,tune]<3,>=2->finrl==0.3.6) (2023.7.1)\n",
            "Requirement already satisfied: referencing>=0.28.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jsonschema->ray[default,tune]<3,>=2->finrl==0.3.6) (0.30.2)\n",
            "Requirement already satisfied: rpds-py>=0.7.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jsonschema->ray[default,tune]<3,>=2->finrl==0.3.6) (0.10.6)\n",
            "Requirement already satisfied: opencensus-context>=0.1.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from opencensus->ray[default,tune]<3,>=2->finrl==0.3.6) (0.1.3)\n",
            "Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from opencensus->ray[default,tune]<3,>=2->finrl==0.3.6) (2.12.0)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from rich->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (3.0.0)\n",
            "Requirement already satisfied: importlib-resources in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from ale-py~=0.8.1->shimmy[atari]~=1.1.0->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (6.1.0)\n",
            "Requirement already satisfied: wcwidth>=0.1.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from blessed>=1.17.1->gpustat>=1.0.0->ray[default,tune]<3,>=2->finrl==0.3.6) (0.2.8)\n",
            "Requirement already satisfied: pycparser in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=2.6.1->ccxt<4,>=3->finrl==0.3.6) (2.21)\n",
            "Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default,tune]<3,>=2->finrl==0.3.6) (1.61.0)\n",
            "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (5.3.1)\n",
            "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.3.0)\n",
            "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (4.9)\n",
            "Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (1.3.1)\n",
            "Requirement already satisfied: parso<0.9.0,>=0.8.3 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from jedi>=0.16->ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.8.3)\n",
            "Requirement already satisfied: mdurl~=0.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.1.2)\n",
            "Requirement already satisfied: qdldl in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from osqp>=0.6.2->cvxpy<2.0.0,>=1.1.19->pyportfolioopt<2,>=1->finrl==0.3.6) (0.1.7.post0)\n",
            "Requirement already satisfied: ptyprocess>=0.5 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pexpect>4.3->ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.7.0)\n",
            "Requirement already satisfied: MarkupSafe>=2.1.1 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (2.1.3)\n",
            "Requirement already satisfied: executing>=1.2.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stack-data->ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (1.2.0)\n",
            "Requirement already satisfied: asttokens>=2.1.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stack-data->ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (2.4.0)\n",
            "Requirement already satisfied: pure-eval in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from stack-data->ipython>=3.2.3->pyfolio<0.10,>=0.9->finrl==0.3.6) (0.2.2)\n",
            "Requirement already satisfied: mpmath>=0.19 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from sympy->torch>=1.13->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (1.3.0)\n",
            "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (0.5.0)\n",
            "Requirement already satisfied: oauthlib>=3.0.0 in /opt/homebrew/Caskroom/miniconda/base/envs/finRL/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard>=2.9.1->stable-baselines3[extra]>=2.0.0a5->finrl==0.3.6) (3.2.2)\n"
          ]
        }
      ],
      "source": [
        "## install required packages\n",
        "!pip install swig\n",
        "!pip install wrds\n",
        "!pip install pyportfolioopt\n",
        "## install finrl library\n",
        "!pip install git+https://github.com/AI4Finance-Foundation/FinRL.git"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 48,
      "metadata": {
        "id": "xt1317y2ixSS"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "import pandas as pd\n",
        "\n",
        "from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv\n",
        "from finrl.agents.stablebaselines3.models import DRLAgent\n",
        "from stable_baselines3.common.logger import configure\n",
        "from finrl import config_tickers\n",
        "from finrl.main import check_and_make_directories\n",
        "from finrl.config import INDICATORS, TRAINED_MODEL_DIR, RESULTS_DIR\n",
        "\n",
        "check_and_make_directories([TRAINED_MODEL_DIR])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aWrSrQv3i0Ng"
      },
      "source": [
        "# Part 2. Build A Market Environment in OpenAI Gym-style"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wiHhM2U-XBMZ"
      },
      "source": [
        "![rl_diagram_transparent_bg.png]()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LeneTRdyZDvy"
      },
      "source": [
        "The core element in reinforcement learning are **agent** and **environment**. You can understand RL as the following process: \n",
        "\n",
        "The agent is active in a world, which is the environment. It observe its current condition as a **state**, and is allowed to do certain **actions**. After the agent execute an action, it will arrive at a new state. At the same time, the environment will have feedback to the agent called **reward**, a numerical signal that tells how good or bad the new state is. As the figure above, agent and environment will keep doing this interaction.\n",
        "\n",
        "The goal of agent is to get as much cumulative reward as possible. Reinforcement learning is the method that agent learns to improve its behavior and achieve that goal."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "w3H88JXkI93v"
      },
      "source": [
        "To achieve this in Python, we follow the OpenAI gym style to build the stock data into environment.\n",
        "\n",
        "state-action-reward are specified as follows:\n",
        "\n",
        "* **State s**: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes the price data and technical indicators based on the past data. It will learn by interacting with the market environment (usually by replaying historical data).\n",
        "\n",
        "* **Action a**: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent\n",
        "selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. \"Buy 10 shares of AAPL\" or \"Sell 10 shares of AAPL\" are 10 or −10, respectively\n",
        "\n",
        "* **Reward function r(s, a, s′)**: Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s',  i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively\n",
        "\n",
        "\n",
        "**Market environment**: 30 constituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SKyZejI0fmp1"
      },
      "source": [
        "## Read data\n",
        "\n",
        "We first read the .csv file of our training data into dataframe."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 49,
      "metadata": {
        "id": "mFCP1YEhi6oi"
      },
      "outputs": [],
      "source": [
        "train = pd.read_csv('train_data.csv')\n",
        "\n",
        "# If you are not using the data generated from part 1 of this tutorial, make sure \n",
        "# it has the columns and index in the form that could be make into the environment. \n",
        "# Then you can comment and skip the following two lines.\n",
        "train = train.set_index(train.columns[0])\n",
        "train.index.names = ['']"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Yw95ZMicgEyi"
      },
      "source": [
        "## Construct the environment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5WZ6-9q2gq9S"
      },
      "source": [
        "Calculate and specify the parameters we need for constructing the environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 50,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "7T3DZPoaIm8k",
        "outputId": "4817e063-400a-416e-f8f2-4b1c4d9c8408"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Stock Dimension: 29, State Space: 291\n"
          ]
        }
      ],
      "source": [
        "stock_dimension = len(train.tic.unique())\n",
        "state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension\n",
        "print(f\"Stock Dimension: {stock_dimension}, State Space: {state_space}\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 51,
      "metadata": {
        "id": "WsOLoeNcJF8Q"
      },
      "outputs": [],
      "source": [
        "buy_cost_list = sell_cost_list = [0.001] * stock_dimension\n",
        "num_stock_shares = [0] * stock_dimension\n",
        "\n",
        "env_kwargs = {\n",
        "    \"hmax\": 100,\n",
        "    \"initial_amount\": 1000000,\n",
        "    \"num_stock_shares\": num_stock_shares,\n",
        "    \"buy_cost_pct\": buy_cost_list,\n",
        "    \"sell_cost_pct\": sell_cost_list,\n",
        "    \"state_space\": state_space,\n",
        "    \"stock_dim\": stock_dimension,\n",
        "    \"tech_indicator_list\": INDICATORS,\n",
        "    \"action_space\": stock_dimension,\n",
        "    \"reward_scaling\": 1e-4\n",
        "}\n",
        "\n",
        "\n",
        "e_train_gym = StockTradingEnv(df = train, **env_kwargs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7We-q73jjaFQ"
      },
      "source": [
        "## Environment for training"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 52,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "aS-SHiGRJK-4",
        "outputId": "a733ecdf-d857-40f5-b399-4325c7ead299"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "<class 'stable_baselines3.common.vec_env.dummy_vec_env.DummyVecEnv'>\n"
          ]
        }
      ],
      "source": [
        "env_train, _ = e_train_gym.get_sb_env()\n",
        "print(type(env_train))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HMNR5nHjh1iz"
      },
      "source": [
        "# Part 3: Train DRL Agents\n",
        "* Here, the DRL algorithms are from **[Stable Baselines 3](https://stable-baselines3.readthedocs.io/en/master/)**. It's a library that implemented popular DRL algorithms using pytorch, succeeding to its old version: Stable Baselines.\n",
        "* Users are also encouraged to try **[ElegantRL](https://github.com/AI4Finance-Foundation/ElegantRL)** and **[Ray RLlib](https://github.com/ray-project/ray)**."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 53,
      "metadata": {
        "id": "364PsqckttcQ"
      },
      "outputs": [],
      "source": [
        "agent = DRLAgent(env = env_train)\n",
        "\n",
        "# Set the corresponding values to 'True' for the algorithms that you want to use\n",
        "if_using_a2c = True\n",
        "if_using_ddpg = True\n",
        "if_using_ppo = True\n",
        "if_using_td3 = True\n",
        "if_using_sac = True"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YDmqOyF9h1iz"
      },
      "source": [
        "## Agent Training: 5 algorithms (A2C, DDPG, PPO, TD3, SAC)\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uijiWgkuh1jB"
      },
      "source": [
        "### Agent 1: A2C\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 54,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "GUCnkn-HIbmj",
        "outputId": "2794a094-a916-448c-ead1-6e20184dde2a"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}\n",
            "Using cpu device\n",
            "Logging to results/a2c\n"
          ]
        }
      ],
      "source": [
        "agent = DRLAgent(env = env_train)\n",
        "model_a2c = agent.get_model(\"a2c\")\n",
        "\n",
        "if if_using_a2c:\n",
        "  # set up logger\n",
        "  tmp_path = RESULTS_DIR + '/a2c'\n",
        "  new_logger_a2c = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
        "  # Set new logger\n",
        "  model_a2c.set_logger(new_logger_a2c)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 55,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "0GVpkWGqH4-D",
        "outputId": "f29cf145-e3b5-4e59-f64d-5921462a8f81"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "----------------------------------------\n",
            "| time/                 |              |\n",
            "|    fps                | 224          |\n",
            "|    iterations         | 100          |\n",
            "|    time_elapsed       | 2            |\n",
            "|    total_timesteps    | 500          |\n",
            "| train/                |              |\n",
            "|    entropy_loss       | -41.3        |\n",
            "|    explained_variance | -0.223       |\n",
            "|    learning_rate      | 0.0007       |\n",
            "|    n_updates          | 99           |\n",
            "|    policy_loss        | -50.3        |\n",
            "|    reward             | -0.037465353 |\n",
            "|    std                | 1.01         |\n",
            "|    value_loss         | 1.89         |\n",
            "----------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 227        |\n",
            "|    iterations         | 200        |\n",
            "|    time_elapsed       | 4          |\n",
            "|    total_timesteps    | 1000       |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.3      |\n",
            "|    explained_variance | -1.19e-07  |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 199        |\n",
            "|    policy_loss        | -66        |\n",
            "|    reward             | -1.1094745 |\n",
            "|    std                | 1          |\n",
            "|    value_loss         | 3.69       |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 229       |\n",
            "|    iterations         | 300       |\n",
            "|    time_elapsed       | 6         |\n",
            "|    total_timesteps    | 1500      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.3     |\n",
            "|    explained_variance | -0.00935  |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 299       |\n",
            "|    policy_loss        | -355      |\n",
            "|    reward             | 5.7870684 |\n",
            "|    std                | 1         |\n",
            "|    value_loss         | 78.5      |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 228       |\n",
            "|    iterations         | 400       |\n",
            "|    time_elapsed       | 8         |\n",
            "|    total_timesteps    | 2000      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.3     |\n",
            "|    explained_variance | 5.96e-08  |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 399       |\n",
            "|    policy_loss        | -13.3     |\n",
            "|    reward             | 4.2819147 |\n",
            "|    std                | 1.01      |\n",
            "|    value_loss         | 6.53      |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 229       |\n",
            "|    iterations         | 500       |\n",
            "|    time_elapsed       | 10        |\n",
            "|    total_timesteps    | 2500      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.3     |\n",
            "|    explained_variance | 5.96e-08  |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 499       |\n",
            "|    policy_loss        | 484       |\n",
            "|    reward             | -6.584406 |\n",
            "|    std                | 1.01      |\n",
            "|    value_loss         | 170       |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 231        |\n",
            "|    iterations         | 600        |\n",
            "|    time_elapsed       | 12         |\n",
            "|    total_timesteps    | 3000       |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.3      |\n",
            "|    explained_variance | 0.132      |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 599        |\n",
            "|    policy_loss        | 208        |\n",
            "|    reward             | 0.12205061 |\n",
            "|    std                | 1          |\n",
            "|    value_loss         | 25.4       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 232        |\n",
            "|    iterations         | 700        |\n",
            "|    time_elapsed       | 15         |\n",
            "|    total_timesteps    | 3500       |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.3      |\n",
            "|    explained_variance | -0.682     |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 699        |\n",
            "|    policy_loss        | -14.2      |\n",
            "|    reward             | -2.9013717 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 0.388      |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 233        |\n",
            "|    iterations         | 800        |\n",
            "|    time_elapsed       | 17         |\n",
            "|    total_timesteps    | 4000       |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.4      |\n",
            "|    explained_variance | -0.6       |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 799        |\n",
            "|    policy_loss        | 15.2       |\n",
            "|    reward             | -2.5466058 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 1.67       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 234        |\n",
            "|    iterations         | 900        |\n",
            "|    time_elapsed       | 19         |\n",
            "|    total_timesteps    | 4500       |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.4      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 899        |\n",
            "|    policy_loss        | 74.1       |\n",
            "|    reward             | 0.29885504 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 4.2        |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 235        |\n",
            "|    iterations         | 1000       |\n",
            "|    time_elapsed       | 21         |\n",
            "|    total_timesteps    | 5000       |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.4      |\n",
            "|    explained_variance | 0.27       |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 999        |\n",
            "|    policy_loss        | 20.8       |\n",
            "|    reward             | -3.0598388 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 0.738      |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 236       |\n",
            "|    iterations         | 1100      |\n",
            "|    time_elapsed       | 23        |\n",
            "|    total_timesteps    | 5500      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.4     |\n",
            "|    explained_variance | -1.19e-07 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 1099      |\n",
            "|    policy_loss        | -290      |\n",
            "|    reward             | 2.3655646 |\n",
            "|    std                | 1.01      |\n",
            "|    value_loss         | 49.4      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 236        |\n",
            "|    iterations         | 1200       |\n",
            "|    time_elapsed       | 25         |\n",
            "|    total_timesteps    | 6000       |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.4      |\n",
            "|    explained_variance | -0.042     |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 1199       |\n",
            "|    policy_loss        | -153       |\n",
            "|    reward             | -0.1700692 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 15.6       |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 236       |\n",
            "|    iterations         | 1300      |\n",
            "|    time_elapsed       | 27        |\n",
            "|    total_timesteps    | 6500      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.4     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 1299      |\n",
            "|    policy_loss        | 31.6      |\n",
            "|    reward             | -3.308388 |\n",
            "|    std                | 1.01      |\n",
            "|    value_loss         | 5.45      |\n",
            "-------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 236      |\n",
            "|    iterations         | 1400     |\n",
            "|    time_elapsed       | 29       |\n",
            "|    total_timesteps    | 7000     |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.4    |\n",
            "|    explained_variance | -0.282   |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 1399     |\n",
            "|    policy_loss        | 90.4     |\n",
            "|    reward             | 2.044719 |\n",
            "|    std                | 1.01     |\n",
            "|    value_loss         | 8.05     |\n",
            "------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 237      |\n",
            "|    iterations         | 1500     |\n",
            "|    time_elapsed       | 31       |\n",
            "|    total_timesteps    | 7500     |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.4    |\n",
            "|    explained_variance | 0        |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 1499     |\n",
            "|    policy_loss        | 159      |\n",
            "|    reward             | 2.234015 |\n",
            "|    std                | 1.01     |\n",
            "|    value_loss         | 19       |\n",
            "------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 237       |\n",
            "|    iterations         | 1600      |\n",
            "|    time_elapsed       | 33        |\n",
            "|    total_timesteps    | 8000      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.5     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 1599      |\n",
            "|    policy_loss        | 604       |\n",
            "|    reward             | 0.6812807 |\n",
            "|    std                | 1.01      |\n",
            "|    value_loss         | 204       |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 238       |\n",
            "|    iterations         | 1700      |\n",
            "|    time_elapsed       | 35        |\n",
            "|    total_timesteps    | 8500      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.6     |\n",
            "|    explained_variance | -1.19e-07 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 1699      |\n",
            "|    policy_loss        | 17.7      |\n",
            "|    reward             | 3.5924714 |\n",
            "|    std                | 1.01      |\n",
            "|    value_loss         | 13.7      |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 238       |\n",
            "|    iterations         | 1800      |\n",
            "|    time_elapsed       | 37        |\n",
            "|    total_timesteps    | 9000      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.6     |\n",
            "|    explained_variance | -1.19e-07 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 1799      |\n",
            "|    policy_loss        | -17.7     |\n",
            "|    reward             | 1.2373297 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 0.881     |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 238       |\n",
            "|    iterations         | 1900      |\n",
            "|    time_elapsed       | 39        |\n",
            "|    total_timesteps    | 9500      |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.6     |\n",
            "|    explained_variance | -0.00659  |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 1899      |\n",
            "|    policy_loss        | 33.3      |\n",
            "|    reward             | 0.7601475 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 1.87      |\n",
            "-------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 238      |\n",
            "|    iterations         | 2000     |\n",
            "|    time_elapsed       | 41       |\n",
            "|    total_timesteps    | 10000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.5    |\n",
            "|    explained_variance | 0        |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 1999     |\n",
            "|    policy_loss        | 108      |\n",
            "|    reward             | 0.324368 |\n",
            "|    std                | 1.01     |\n",
            "|    value_loss         | 18       |\n",
            "------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 238        |\n",
            "|    iterations         | 2100       |\n",
            "|    time_elapsed       | 43         |\n",
            "|    total_timesteps    | 10500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.5      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 2099       |\n",
            "|    policy_loss        | 2.23       |\n",
            "|    reward             | 0.73916864 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 1.07       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 238        |\n",
            "|    iterations         | 2200       |\n",
            "|    time_elapsed       | 46         |\n",
            "|    total_timesteps    | 11000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.5      |\n",
            "|    explained_variance | 1.19e-07   |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 2199       |\n",
            "|    policy_loss        | 53.9       |\n",
            "|    reward             | -7.7945447 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 6.99       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 238        |\n",
            "|    iterations         | 2300       |\n",
            "|    time_elapsed       | 48         |\n",
            "|    total_timesteps    | 11500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.5      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 2299       |\n",
            "|    policy_loss        | -2.54e+03  |\n",
            "|    reward             | -13.428976 |\n",
            "|    std                | 1.01       |\n",
            "|    value_loss         | 4.37e+03   |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 2400      |\n",
            "|    time_elapsed       | 50        |\n",
            "|    total_timesteps    | 12000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.5     |\n",
            "|    explained_variance | -0.0036   |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 2399      |\n",
            "|    policy_loss        | 72.2      |\n",
            "|    reward             | 0.6531278 |\n",
            "|    std                | 1.01      |\n",
            "|    value_loss         | 7.36      |\n",
            "-------------------------------------\n",
            "----------------------------------------\n",
            "| time/                 |              |\n",
            "|    fps                | 239          |\n",
            "|    iterations         | 2500         |\n",
            "|    time_elapsed       | 52           |\n",
            "|    total_timesteps    | 12500        |\n",
            "| train/                |              |\n",
            "|    entropy_loss       | -41.5        |\n",
            "|    explained_variance | 1.19e-07     |\n",
            "|    learning_rate      | 0.0007       |\n",
            "|    n_updates          | 2499         |\n",
            "|    policy_loss        | -93.6        |\n",
            "|    reward             | -0.054788433 |\n",
            "|    std                | 1.01         |\n",
            "|    value_loss         | 5.55         |\n",
            "----------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 239      |\n",
            "|    iterations         | 2600     |\n",
            "|    time_elapsed       | 54       |\n",
            "|    total_timesteps    | 13000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.6    |\n",
            "|    explained_variance | 0        |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 2599     |\n",
            "|    policy_loss        | -44      |\n",
            "|    reward             | 2.054459 |\n",
            "|    std                | 1.02     |\n",
            "|    value_loss         | 1.29     |\n",
            "------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 2700       |\n",
            "|    time_elapsed       | 56         |\n",
            "|    total_timesteps    | 13500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.6      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 2699       |\n",
            "|    policy_loss        | -38.7      |\n",
            "|    reward             | -1.6641923 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 2.43       |\n",
            "--------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 239      |\n",
            "|    iterations         | 2800     |\n",
            "|    time_elapsed       | 58       |\n",
            "|    total_timesteps    | 14000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.7    |\n",
            "|    explained_variance | 0        |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 2799     |\n",
            "|    policy_loss        | 479      |\n",
            "|    reward             | 3.717238 |\n",
            "|    std                | 1.02     |\n",
            "|    value_loss         | 121      |\n",
            "------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 2900      |\n",
            "|    time_elapsed       | 60        |\n",
            "|    total_timesteps    | 14500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.7     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 2899      |\n",
            "|    policy_loss        | -79.4     |\n",
            "|    reward             | 1.3675122 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 4         |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 3000      |\n",
            "|    time_elapsed       | 62        |\n",
            "|    total_timesteps    | 15000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.7     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 2999      |\n",
            "|    policy_loss        | 64.1      |\n",
            "|    reward             | 1.4422324 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 4.48      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 3100       |\n",
            "|    time_elapsed       | 64         |\n",
            "|    total_timesteps    | 15500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.7      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 3099       |\n",
            "|    policy_loss        | 67.3       |\n",
            "|    reward             | -1.1655574 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 4.82       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 3200       |\n",
            "|    time_elapsed       | 66         |\n",
            "|    total_timesteps    | 16000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.7      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 3199       |\n",
            "|    policy_loss        | 1.85       |\n",
            "|    reward             | -2.1980412 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 7.14       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 3300       |\n",
            "|    time_elapsed       | 68         |\n",
            "|    total_timesteps    | 16500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.8      |\n",
            "|    explained_variance | 0.000289   |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 3299       |\n",
            "|    policy_loss        | -103       |\n",
            "|    reward             | 0.73173565 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 9.18       |\n",
            "--------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 239      |\n",
            "|    iterations         | 3400     |\n",
            "|    time_elapsed       | 70       |\n",
            "|    total_timesteps    | 17000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.8    |\n",
            "|    explained_variance | 1.19e-07 |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 3399     |\n",
            "|    policy_loss        | -40.7    |\n",
            "|    reward             | 9.409764 |\n",
            "|    std                | 1.02     |\n",
            "|    value_loss         | 9.72     |\n",
            "------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 3500       |\n",
            "|    time_elapsed       | 73         |\n",
            "|    total_timesteps    | 17500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.8      |\n",
            "|    explained_variance | -0.0564    |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 3499       |\n",
            "|    policy_loss        | 185        |\n",
            "|    reward             | 0.76249176 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 29.2       |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 3600      |\n",
            "|    time_elapsed       | 75        |\n",
            "|    total_timesteps    | 18000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.8     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 3599      |\n",
            "|    policy_loss        | -97.6     |\n",
            "|    reward             | 1.6970446 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 9.71      |\n",
            "-------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 239         |\n",
            "|    iterations         | 3700        |\n",
            "|    time_elapsed       | 77          |\n",
            "|    total_timesteps    | 18500       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.8       |\n",
            "|    explained_variance | 0           |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 3699        |\n",
            "|    policy_loss        | 187         |\n",
            "|    reward             | -0.09103918 |\n",
            "|    std                | 1.02        |\n",
            "|    value_loss         | 25.3        |\n",
            "---------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 3800       |\n",
            "|    time_elapsed       | 79         |\n",
            "|    total_timesteps    | 19000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.8      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 3799       |\n",
            "|    policy_loss        | 58         |\n",
            "|    reward             | 0.23512848 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 2.5        |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 3900      |\n",
            "|    time_elapsed       | 81        |\n",
            "|    total_timesteps    | 19500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.7     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 3899      |\n",
            "|    policy_loss        | -225      |\n",
            "|    reward             | 0.7290803 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 35.6      |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 4000      |\n",
            "|    time_elapsed       | 83        |\n",
            "|    total_timesteps    | 20000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.7     |\n",
            "|    explained_variance | 1.19e-07  |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 3999      |\n",
            "|    policy_loss        | 4.38      |\n",
            "|    reward             | 4.4288836 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 8.83      |\n",
            "-------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 239         |\n",
            "|    iterations         | 4100        |\n",
            "|    time_elapsed       | 85          |\n",
            "|    total_timesteps    | 20500       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.8       |\n",
            "|    explained_variance | -0.0252     |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 4099        |\n",
            "|    policy_loss        | -20.2       |\n",
            "|    reward             | -0.45327786 |\n",
            "|    std                | 1.02        |\n",
            "|    value_loss         | 1.52        |\n",
            "---------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 4200       |\n",
            "|    time_elapsed       | 87         |\n",
            "|    total_timesteps    | 21000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.8      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 4199       |\n",
            "|    policy_loss        | -296       |\n",
            "|    reward             | 0.28228554 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 47.5       |\n",
            "--------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 239      |\n",
            "|    iterations         | 4300     |\n",
            "|    time_elapsed       | 89       |\n",
            "|    total_timesteps    | 21500    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.8    |\n",
            "|    explained_variance | 1.89e-05 |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 4299     |\n",
            "|    policy_loss        | -128     |\n",
            "|    reward             | 3.795049 |\n",
            "|    std                | 1.02     |\n",
            "|    value_loss         | 10.7     |\n",
            "------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 4400      |\n",
            "|    time_elapsed       | 91        |\n",
            "|    total_timesteps    | 22000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.8     |\n",
            "|    explained_variance | -1.19e-07 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 4399      |\n",
            "|    policy_loss        | 147       |\n",
            "|    reward             | 1.6300098 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 19.4      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 4500       |\n",
            "|    time_elapsed       | 93         |\n",
            "|    total_timesteps    | 22500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.7      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 4499       |\n",
            "|    policy_loss        | 49.9       |\n",
            "|    reward             | -1.7055401 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 5.62       |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 4600      |\n",
            "|    time_elapsed       | 95        |\n",
            "|    total_timesteps    | 23000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.8     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 4599      |\n",
            "|    policy_loss        | 207       |\n",
            "|    reward             | 6.2434287 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 29.8      |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 4700      |\n",
            "|    time_elapsed       | 98        |\n",
            "|    total_timesteps    | 23500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.8     |\n",
            "|    explained_variance | -9.11e-05 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 4699      |\n",
            "|    policy_loss        | -117      |\n",
            "|    reward             | 0.6695327 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 13.8      |\n",
            "-------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 239         |\n",
            "|    iterations         | 4800        |\n",
            "|    time_elapsed       | 100         |\n",
            "|    total_timesteps    | 24000       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.8       |\n",
            "|    explained_variance | 5.96e-08    |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 4799        |\n",
            "|    policy_loss        | -233        |\n",
            "|    reward             | -0.89176166 |\n",
            "|    std                | 1.02        |\n",
            "|    value_loss         | 34.3        |\n",
            "---------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 4900      |\n",
            "|    time_elapsed       | 102       |\n",
            "|    total_timesteps    | 24500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.7     |\n",
            "|    explained_variance | 1.19e-07  |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 4899      |\n",
            "|    policy_loss        | -32.8     |\n",
            "|    reward             | 1.3403009 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 4.4       |\n",
            "-------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 239      |\n",
            "|    iterations         | 5000     |\n",
            "|    time_elapsed       | 104      |\n",
            "|    total_timesteps    | 25000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.7    |\n",
            "|    explained_variance | 0        |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 4999     |\n",
            "|    policy_loss        | -66.3    |\n",
            "|    reward             | 2.62325  |\n",
            "|    std                | 1.02     |\n",
            "|    value_loss         | 6.67     |\n",
            "------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 239         |\n",
            "|    iterations         | 5100        |\n",
            "|    time_elapsed       | 106         |\n",
            "|    total_timesteps    | 25500       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.8       |\n",
            "|    explained_variance | 0           |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 5099        |\n",
            "|    policy_loss        | 361         |\n",
            "|    reward             | -0.43670595 |\n",
            "|    std                | 1.02        |\n",
            "|    value_loss         | 80          |\n",
            "---------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 5200      |\n",
            "|    time_elapsed       | 108       |\n",
            "|    total_timesteps    | 26000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.7     |\n",
            "|    explained_variance | -3.54e-05 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 5199      |\n",
            "|    policy_loss        | -439      |\n",
            "|    reward             | 19.544321 |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 208       |\n",
            "-------------------------------------\n",
            "day: 2892, episode: 10\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 9252879.94\n",
            "total_reward: 8252879.94\n",
            "total_cost: 58131.76\n",
            "total_trades: 54255\n",
            "Sharpe: 1.103\n",
            "=================================\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 5300       |\n",
            "|    time_elapsed       | 110        |\n",
            "|    total_timesteps    | 26500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.7      |\n",
            "|    explained_variance | -0.00821   |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 5299       |\n",
            "|    policy_loss        | -13        |\n",
            "|    reward             | 0.07991036 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 1.52       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 5400       |\n",
            "|    time_elapsed       | 112        |\n",
            "|    total_timesteps    | 27000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.8      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 5399       |\n",
            "|    policy_loss        | -113       |\n",
            "|    reward             | -0.3657025 |\n",
            "|    std                | 1.02       |\n",
            "|    value_loss         | 8.56       |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 5500      |\n",
            "|    time_elapsed       | 114       |\n",
            "|    total_timesteps    | 27500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.8     |\n",
            "|    explained_variance | -1.19e-07 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 5499      |\n",
            "|    policy_loss        | 172       |\n",
            "|    reward             | 4.382681  |\n",
            "|    std                | 1.02      |\n",
            "|    value_loss         | 33.8      |\n",
            "-------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 239      |\n",
            "|    iterations         | 5600     |\n",
            "|    time_elapsed       | 116      |\n",
            "|    total_timesteps    | 28000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.8    |\n",
            "|    explained_variance | -0.252   |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 5599     |\n",
            "|    policy_loss        | -63.1    |\n",
            "|    reward             | 1.837829 |\n",
            "|    std                | 1.02     |\n",
            "|    value_loss         | 23.1     |\n",
            "------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 5700      |\n",
            "|    time_elapsed       | 118       |\n",
            "|    total_timesteps    | 28500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 5699      |\n",
            "|    policy_loss        | -784      |\n",
            "|    reward             | -9.476494 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 436       |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 5800      |\n",
            "|    time_elapsed       | 120       |\n",
            "|    total_timesteps    | 29000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | -4.52     |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 5799      |\n",
            "|    policy_loss        | 93.1      |\n",
            "|    reward             | 1.3581291 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 14.4      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 5900       |\n",
            "|    time_elapsed       | 122        |\n",
            "|    total_timesteps    | 29500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 5899       |\n",
            "|    policy_loss        | 99.7       |\n",
            "|    reward             | 0.44164747 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 8.4        |\n",
            "--------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 239         |\n",
            "|    iterations         | 6000        |\n",
            "|    time_elapsed       | 125         |\n",
            "|    total_timesteps    | 30000       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.9       |\n",
            "|    explained_variance | -0.0238     |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 5999        |\n",
            "|    policy_loss        | 101         |\n",
            "|    reward             | -0.89125896 |\n",
            "|    std                | 1.03        |\n",
            "|    value_loss         | 8.2         |\n",
            "---------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 6100      |\n",
            "|    time_elapsed       | 127       |\n",
            "|    total_timesteps    | 30500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 6099      |\n",
            "|    policy_loss        | -152      |\n",
            "|    reward             | -5.963742 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 20.7      |\n",
            "-------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 239         |\n",
            "|    iterations         | 6200        |\n",
            "|    time_elapsed       | 129         |\n",
            "|    total_timesteps    | 31000       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.8       |\n",
            "|    explained_variance | 0           |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 6199        |\n",
            "|    policy_loss        | -160        |\n",
            "|    reward             | -0.22414756 |\n",
            "|    std                | 1.03        |\n",
            "|    value_loss         | 20.3        |\n",
            "---------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 239       |\n",
            "|    iterations         | 6300      |\n",
            "|    time_elapsed       | 131       |\n",
            "|    total_timesteps    | 31500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | -1.19e-07 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 6299      |\n",
            "|    policy_loss        | 778       |\n",
            "|    reward             | 5.905317  |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 510       |\n",
            "-------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 239      |\n",
            "|    iterations         | 6400     |\n",
            "|    time_elapsed       | 133      |\n",
            "|    total_timesteps    | 32000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.9    |\n",
            "|    explained_variance | 0        |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 6399     |\n",
            "|    policy_loss        | 114      |\n",
            "|    reward             | 2.542227 |\n",
            "|    std                | 1.03     |\n",
            "|    value_loss         | 8.24     |\n",
            "------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 239        |\n",
            "|    iterations         | 6500       |\n",
            "|    time_elapsed       | 135        |\n",
            "|    total_timesteps    | 32500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 6499       |\n",
            "|    policy_loss        | 93.3       |\n",
            "|    reward             | -5.9873405 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 18.5       |\n",
            "--------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 239         |\n",
            "|    iterations         | 6600        |\n",
            "|    time_elapsed       | 137         |\n",
            "|    total_timesteps    | 33000       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -42         |\n",
            "|    explained_variance | -0.0326     |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 6599        |\n",
            "|    policy_loss        | 16.9        |\n",
            "|    reward             | -0.05295783 |\n",
            "|    std                | 1.03        |\n",
            "|    value_loss         | 3.18        |\n",
            "---------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 240        |\n",
            "|    iterations         | 6700       |\n",
            "|    time_elapsed       | 139        |\n",
            "|    total_timesteps    | 33500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0.00849    |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 6699       |\n",
            "|    policy_loss        | -1.48e+03  |\n",
            "|    reward             | -11.972401 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 1.3e+03    |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 6800      |\n",
            "|    time_elapsed       | 141       |\n",
            "|    total_timesteps    | 34000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 6799      |\n",
            "|    policy_loss        | -122      |\n",
            "|    reward             | 1.3245162 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 15.5      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 240        |\n",
            "|    iterations         | 6900       |\n",
            "|    time_elapsed       | 143        |\n",
            "|    total_timesteps    | 34500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 6899       |\n",
            "|    policy_loss        | -392       |\n",
            "|    reward             | -2.3928673 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 164        |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 7000      |\n",
            "|    time_elapsed       | 145       |\n",
            "|    total_timesteps    | 35000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | -9.06e-06 |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 6999      |\n",
            "|    policy_loss        | 89        |\n",
            "|    reward             | 0.090436  |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 4.63      |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 7100      |\n",
            "|    time_elapsed       | 147       |\n",
            "|    total_timesteps    | 35500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0.13      |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 7099      |\n",
            "|    policy_loss        | 105       |\n",
            "|    reward             | 2.1522665 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 8.78      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 240        |\n",
            "|    iterations         | 7200       |\n",
            "|    time_elapsed       | 149        |\n",
            "|    total_timesteps    | 36000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -42        |\n",
            "|    explained_variance | 0.0669     |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 7199       |\n",
            "|    policy_loss        | -255       |\n",
            "|    reward             | -1.3714093 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 36.1       |\n",
            "--------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 240         |\n",
            "|    iterations         | 7300        |\n",
            "|    time_elapsed       | 151         |\n",
            "|    total_timesteps    | 36500       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.9       |\n",
            "|    explained_variance | 0           |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 7299        |\n",
            "|    policy_loss        | -146        |\n",
            "|    reward             | -0.76280195 |\n",
            "|    std                | 1.03        |\n",
            "|    value_loss         | 14.6        |\n",
            "---------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 7400      |\n",
            "|    time_elapsed       | 153       |\n",
            "|    total_timesteps    | 37000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -42       |\n",
            "|    explained_variance | 0.237     |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 7399      |\n",
            "|    policy_loss        | 116       |\n",
            "|    reward             | -10.07473 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 9.29      |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 7500      |\n",
            "|    time_elapsed       | 155       |\n",
            "|    total_timesteps    | 37500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -42       |\n",
            "|    explained_variance | 0.0134    |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 7499      |\n",
            "|    policy_loss        | 197       |\n",
            "|    reward             | -7.212246 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 34.2      |\n",
            "-------------------------------------\n",
            "------------------------------------\n",
            "| time/                 |          |\n",
            "|    fps                | 240      |\n",
            "|    iterations         | 7600     |\n",
            "|    time_elapsed       | 157      |\n",
            "|    total_timesteps    | 38000    |\n",
            "| train/                |          |\n",
            "|    entropy_loss       | -41.9    |\n",
            "|    explained_variance | 0        |\n",
            "|    learning_rate      | 0.0007   |\n",
            "|    n_updates          | 7599     |\n",
            "|    policy_loss        | -119     |\n",
            "|    reward             | 1.233767 |\n",
            "|    std                | 1.03     |\n",
            "|    value_loss         | 9.15     |\n",
            "------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 7700      |\n",
            "|    time_elapsed       | 159       |\n",
            "|    total_timesteps    | 38500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 7699      |\n",
            "|    policy_loss        | -246      |\n",
            "|    reward             | 1.5250477 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 36        |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 240        |\n",
            "|    iterations         | 7800       |\n",
            "|    time_elapsed       | 162        |\n",
            "|    total_timesteps    | 39000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 7799       |\n",
            "|    policy_loss        | -239       |\n",
            "|    reward             | -1.3093787 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 36.1       |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 7900      |\n",
            "|    time_elapsed       | 164       |\n",
            "|    total_timesteps    | 39500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 7899      |\n",
            "|    policy_loss        | 295       |\n",
            "|    reward             | 6.1359696 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 90        |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 240        |\n",
            "|    iterations         | 8000       |\n",
            "|    time_elapsed       | 166        |\n",
            "|    total_timesteps    | 40000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 7999       |\n",
            "|    policy_loss        | -264       |\n",
            "|    reward             | -2.9735663 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 63.6       |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 240       |\n",
            "|    iterations         | 8100      |\n",
            "|    time_elapsed       | 168       |\n",
            "|    total_timesteps    | 40500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 8099      |\n",
            "|    policy_loss        | 359       |\n",
            "|    reward             | 5.4868984 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 107       |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 240        |\n",
            "|    iterations         | 8200       |\n",
            "|    time_elapsed       | 170        |\n",
            "|    total_timesteps    | 41000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 8199       |\n",
            "|    policy_loss        | -15.1      |\n",
            "|    reward             | 0.04860192 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 1          |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 240        |\n",
            "|    iterations         | 8300       |\n",
            "|    time_elapsed       | 172        |\n",
            "|    total_timesteps    | 41500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 8299       |\n",
            "|    policy_loss        | 154        |\n",
            "|    reward             | -1.7154597 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 16.4       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 8400       |\n",
            "|    time_elapsed       | 174        |\n",
            "|    total_timesteps    | 42000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.8      |\n",
            "|    explained_variance | 5.96e-08   |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 8399       |\n",
            "|    policy_loss        | -176       |\n",
            "|    reward             | -3.0457652 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 22.3       |\n",
            "--------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 241         |\n",
            "|    iterations         | 8500        |\n",
            "|    time_elapsed       | 176         |\n",
            "|    total_timesteps    | 42500       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.9       |\n",
            "|    explained_variance | 0           |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 8499        |\n",
            "|    policy_loss        | -191        |\n",
            "|    reward             | -0.14926888 |\n",
            "|    std                | 1.03        |\n",
            "|    value_loss         | 21.6        |\n",
            "---------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 8600       |\n",
            "|    time_elapsed       | 178        |\n",
            "|    total_timesteps    | 43000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | -1.19e-07  |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 8599       |\n",
            "|    policy_loss        | 478        |\n",
            "|    reward             | -19.747784 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 140        |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 241       |\n",
            "|    iterations         | 8700      |\n",
            "|    time_elapsed       | 180       |\n",
            "|    total_timesteps    | 43500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 8699      |\n",
            "|    policy_loss        | -18.3     |\n",
            "|    reward             | 0.7188143 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 2.23      |\n",
            "-------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 241         |\n",
            "|    iterations         | 8800        |\n",
            "|    time_elapsed       | 182         |\n",
            "|    total_timesteps    | 44000       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.9       |\n",
            "|    explained_variance | 0           |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 8799        |\n",
            "|    policy_loss        | 33.2        |\n",
            "|    reward             | -0.06739279 |\n",
            "|    std                | 1.03        |\n",
            "|    value_loss         | 1.91        |\n",
            "---------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 8900       |\n",
            "|    time_elapsed       | 184        |\n",
            "|    total_timesteps    | 44500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | -1.19e-07  |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 8899       |\n",
            "|    policy_loss        | 28.6       |\n",
            "|    reward             | -1.5822812 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 1.34       |\n",
            "--------------------------------------\n",
            "---------------------------------------\n",
            "| time/                 |             |\n",
            "|    fps                | 241         |\n",
            "|    iterations         | 9000        |\n",
            "|    time_elapsed       | 186         |\n",
            "|    total_timesteps    | 45000       |\n",
            "| train/                |             |\n",
            "|    entropy_loss       | -41.9       |\n",
            "|    explained_variance | 0           |\n",
            "|    learning_rate      | 0.0007      |\n",
            "|    n_updates          | 8999        |\n",
            "|    policy_loss        | -67.4       |\n",
            "|    reward             | -0.18698789 |\n",
            "|    std                | 1.03        |\n",
            "|    value_loss         | 6.92        |\n",
            "---------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 241       |\n",
            "|    iterations         | 9100      |\n",
            "|    time_elapsed       | 188       |\n",
            "|    total_timesteps    | 45500     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 5.96e-08  |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 9099      |\n",
            "|    policy_loss        | 16        |\n",
            "|    reward             | 2.2883642 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 0.313     |\n",
            "-------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 241       |\n",
            "|    iterations         | 9200      |\n",
            "|    time_elapsed       | 190       |\n",
            "|    total_timesteps    | 46000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 9199      |\n",
            "|    policy_loss        | -200      |\n",
            "|    reward             | 3.5835774 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 60.2      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 9300       |\n",
            "|    time_elapsed       | 192        |\n",
            "|    total_timesteps    | 46500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 9299       |\n",
            "|    policy_loss        | -183       |\n",
            "|    reward             | 0.30612326 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 23.2       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 9400       |\n",
            "|    time_elapsed       | 194        |\n",
            "|    total_timesteps    | 47000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -42        |\n",
            "|    explained_variance | -1.19e-07  |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 9399       |\n",
            "|    policy_loss        | 175        |\n",
            "|    reward             | 0.48061144 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 20         |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 9500       |\n",
            "|    time_elapsed       | 196        |\n",
            "|    total_timesteps    | 47500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -42        |\n",
            "|    explained_variance | 5.96e-08   |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 9499       |\n",
            "|    policy_loss        | 54.2       |\n",
            "|    reward             | 0.24561676 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 6.82       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 9600       |\n",
            "|    time_elapsed       | 198        |\n",
            "|    total_timesteps    | 48000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | -1.19e-07  |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 9599       |\n",
            "|    policy_loss        | -110       |\n",
            "|    reward             | -2.3700995 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 12         |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 9700       |\n",
            "|    time_elapsed       | 200        |\n",
            "|    total_timesteps    | 48500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 9699       |\n",
            "|    policy_loss        | -111       |\n",
            "|    reward             | 0.64502627 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 7.2        |\n",
            "--------------------------------------\n",
            "-------------------------------------\n",
            "| time/                 |           |\n",
            "|    fps                | 241       |\n",
            "|    iterations         | 9800      |\n",
            "|    time_elapsed       | 202       |\n",
            "|    total_timesteps    | 49000     |\n",
            "| train/                |           |\n",
            "|    entropy_loss       | -41.9     |\n",
            "|    explained_variance | 0         |\n",
            "|    learning_rate      | 0.0007    |\n",
            "|    n_updates          | 9799      |\n",
            "|    policy_loss        | 39        |\n",
            "|    reward             | 7.1939497 |\n",
            "|    std                | 1.03      |\n",
            "|    value_loss         | 60.2      |\n",
            "-------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 9900       |\n",
            "|    time_elapsed       | 204        |\n",
            "|    total_timesteps    | 49500      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 9899       |\n",
            "|    policy_loss        | 48         |\n",
            "|    reward             | 0.46383533 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 1.74       |\n",
            "--------------------------------------\n",
            "--------------------------------------\n",
            "| time/                 |            |\n",
            "|    fps                | 241        |\n",
            "|    iterations         | 10000      |\n",
            "|    time_elapsed       | 206        |\n",
            "|    total_timesteps    | 50000      |\n",
            "| train/                |            |\n",
            "|    entropy_loss       | -41.9      |\n",
            "|    explained_variance | 0          |\n",
            "|    learning_rate      | 0.0007     |\n",
            "|    n_updates          | 9999       |\n",
            "|    policy_loss        | 24.1       |\n",
            "|    reward             | -0.3937383 |\n",
            "|    std                | 1.03       |\n",
            "|    value_loss         | 2.78       |\n",
            "--------------------------------------\n"
          ]
        }
      ],
      "source": [
        "trained_a2c = agent.train_model(model=model_a2c, \n",
        "                             tb_log_name='a2c',\n",
        "                             total_timesteps=50000) if if_using_a2c else None"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 56,
      "metadata": {
        "id": "zjCWfgsg3sVa"
      },
      "outputs": [],
      "source": [
        "trained_a2c.save(TRAINED_MODEL_DIR + \"/agent_a2c\") if if_using_a2c else None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MRiOtrywfAo1"
      },
      "source": [
        "### Agent 2: DDPG"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 57,
      "metadata": {
        "id": "M2YadjfnLwgt"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}\n",
            "Using cpu device\n",
            "Logging to results/ddpg\n"
          ]
        }
      ],
      "source": [
        "agent = DRLAgent(env = env_train)\n",
        "model_ddpg = agent.get_model(\"ddpg\")\n",
        "\n",
        "if if_using_ddpg:\n",
        "  # set up logger\n",
        "  tmp_path = RESULTS_DIR + '/ddpg'\n",
        "  new_logger_ddpg = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
        "  # Set new logger\n",
        "  model_ddpg.set_logger(new_logger_ddpg)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 58,
      "metadata": {
        "id": "tCDa78rqfO_a"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "day: 2892, episode: 20\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 5320498.13\n",
            "total_reward: 4320498.13\n",
            "total_cost: 4980.45\n",
            "total_trades: 62355\n",
            "Sharpe: 0.940\n",
            "=================================\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 4         |\n",
            "|    fps             | 128       |\n",
            "|    time_elapsed    | 89        |\n",
            "|    total_timesteps | 11572     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | -88.1     |\n",
            "|    critic_loss     | 3.15e+03  |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 8679      |\n",
            "|    reward          | 3.6435823 |\n",
            "----------------------------------\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 8         |\n",
            "|    fps             | 118       |\n",
            "|    time_elapsed    | 196       |\n",
            "|    total_timesteps | 23144     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 5.26      |\n",
            "|    critic_loss     | 18.5      |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 20251     |\n",
            "|    reward          | 3.6435823 |\n",
            "----------------------------------\n",
            "day: 2892, episode: 30\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 5653037.27\n",
            "total_reward: 4653037.27\n",
            "total_cost: 999.00\n",
            "total_trades: 52056\n",
            "Sharpe: 0.908\n",
            "=================================\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 12        |\n",
            "|    fps             | 114       |\n",
            "|    time_elapsed    | 302       |\n",
            "|    total_timesteps | 34716     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | -3.28     |\n",
            "|    critic_loss     | 7.77      |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 31823     |\n",
            "|    reward          | 3.6435823 |\n",
            "----------------------------------\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 16        |\n",
            "|    fps             | 113       |\n",
            "|    time_elapsed    | 407       |\n",
            "|    total_timesteps | 46288     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | -8.99     |\n",
            "|    critic_loss     | 4.52      |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 43395     |\n",
            "|    reward          | 3.6435823 |\n",
            "----------------------------------\n"
          ]
        }
      ],
      "source": [
        "trained_ddpg = agent.train_model(model=model_ddpg, \n",
        "                             tb_log_name='ddpg',\n",
        "                             total_timesteps=50000) if if_using_ddpg else None"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 59,
      "metadata": {
        "id": "ne6M2R-WvrUQ"
      },
      "outputs": [],
      "source": [
        "trained_ddpg.save(TRAINED_MODEL_DIR + \"/agent_ddpg\") if if_using_ddpg else None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_gDkU-j-fCmZ"
      },
      "source": [
        "### Agent 3: PPO"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 60,
      "metadata": {
        "id": "y5D5PFUhMzSV"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 128}\n",
            "Using cpu device\n",
            "Logging to results/ppo\n"
          ]
        }
      ],
      "source": [
        "agent = DRLAgent(env = env_train)\n",
        "PPO_PARAMS = {\n",
        "    \"n_steps\": 2048,\n",
        "    \"ent_coef\": 0.01,\n",
        "    \"learning_rate\": 0.00025,\n",
        "    \"batch_size\": 128,\n",
        "}\n",
        "model_ppo = agent.get_model(\"ppo\",model_kwargs = PPO_PARAMS)\n",
        "\n",
        "if if_using_ppo:\n",
        "  # set up logger\n",
        "  tmp_path = RESULTS_DIR + '/ppo'\n",
        "  new_logger_ppo = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
        "  # Set new logger\n",
        "  model_ppo.set_logger(new_logger_ppo)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 61,
      "metadata": {
        "id": "Gt8eIQKYM4G3"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "-----------------------------------\n",
            "| time/              |            |\n",
            "|    fps             | 273        |\n",
            "|    iterations      | 1          |\n",
            "|    time_elapsed    | 7          |\n",
            "|    total_timesteps | 2048       |\n",
            "| train/             |            |\n",
            "|    reward          | 0.08926041 |\n",
            "-----------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 268         |\n",
            "|    iterations           | 2           |\n",
            "|    time_elapsed         | 15          |\n",
            "|    total_timesteps      | 4096        |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.014204679 |\n",
            "|    clip_fraction        | 0.204       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.2       |\n",
            "|    explained_variance   | 0.00689     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 5.49        |\n",
            "|    n_updates            | 10          |\n",
            "|    policy_gradient_loss | -0.0299     |\n",
            "|    reward               | 0.86135125  |\n",
            "|    std                  | 1           |\n",
            "|    value_loss           | 17.8        |\n",
            "-----------------------------------------\n",
            "day: 2892, episode: 40\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 2684399.34\n",
            "total_reward: 1684399.34\n",
            "total_cost: 331378.46\n",
            "total_trades: 80317\n",
            "Sharpe: 0.570\n",
            "=================================\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 266        |\n",
            "|    iterations           | 3          |\n",
            "|    time_elapsed         | 23         |\n",
            "|    total_timesteps      | 6144       |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.01174316 |\n",
            "|    clip_fraction        | 0.115      |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -41.2      |\n",
            "|    explained_variance   | 0.00352    |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 25         |\n",
            "|    n_updates            | 20         |\n",
            "|    policy_gradient_loss | -0.0172    |\n",
            "|    reward               | -1.2826169 |\n",
            "|    std                  | 1          |\n",
            "|    value_loss           | 53.8       |\n",
            "----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 265        |\n",
            "|    iterations           | 4          |\n",
            "|    time_elapsed         | 30         |\n",
            "|    total_timesteps      | 8192       |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.01867792 |\n",
            "|    clip_fraction        | 0.227      |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -41.3      |\n",
            "|    explained_variance   | 0.00243    |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 27         |\n",
            "|    n_updates            | 30         |\n",
            "|    policy_gradient_loss | -0.0165    |\n",
            "|    reward               | 1.8152547  |\n",
            "|    std                  | 1.01       |\n",
            "|    value_loss           | 48.4       |\n",
            "----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 5           |\n",
            "|    time_elapsed         | 39          |\n",
            "|    total_timesteps      | 10240       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.018921016 |\n",
            "|    clip_fraction        | 0.16        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.3       |\n",
            "|    explained_variance   | -0.0416     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 5.18        |\n",
            "|    n_updates            | 40          |\n",
            "|    policy_gradient_loss | -0.022      |\n",
            "|    reward               | 2.7809274   |\n",
            "|    std                  | 1.01        |\n",
            "|    value_loss           | 12.4        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 6           |\n",
            "|    time_elapsed         | 47          |\n",
            "|    total_timesteps      | 12288       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.016252043 |\n",
            "|    clip_fraction        | 0.196       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.4       |\n",
            "|    explained_variance   | 0.0182      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 5.79        |\n",
            "|    n_updates            | 50          |\n",
            "|    policy_gradient_loss | -0.0219     |\n",
            "|    reward               | 2.1067123   |\n",
            "|    std                  | 1.01        |\n",
            "|    value_loss           | 25.7        |\n",
            "-----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 260        |\n",
            "|    iterations           | 7          |\n",
            "|    time_elapsed         | 54         |\n",
            "|    total_timesteps      | 14336      |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.02009923 |\n",
            "|    clip_fraction        | 0.27       |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -41.4      |\n",
            "|    explained_variance   | 0.0129     |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 19.4       |\n",
            "|    n_updates            | 60         |\n",
            "|    policy_gradient_loss | -0.0179    |\n",
            "|    reward               | 1.1840192  |\n",
            "|    std                  | 1.01       |\n",
            "|    value_loss           | 47.4       |\n",
            "----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 261         |\n",
            "|    iterations           | 8           |\n",
            "|    time_elapsed         | 62          |\n",
            "|    total_timesteps      | 16384       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.018546926 |\n",
            "|    clip_fraction        | 0.222       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.5       |\n",
            "|    explained_variance   | -0.00519    |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 10.9        |\n",
            "|    n_updates            | 70          |\n",
            "|    policy_gradient_loss | -0.0157     |\n",
            "|    reward               | 0.9240639   |\n",
            "|    std                  | 1.01        |\n",
            "|    value_loss           | 27.3        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 261         |\n",
            "|    iterations           | 9           |\n",
            "|    time_elapsed         | 70          |\n",
            "|    total_timesteps      | 18432       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.016806979 |\n",
            "|    clip_fraction        | 0.141       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.6       |\n",
            "|    explained_variance   | -0.00785    |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 21.8        |\n",
            "|    n_updates            | 80          |\n",
            "|    policy_gradient_loss | -0.0131     |\n",
            "|    reward               | 0.8015673   |\n",
            "|    std                  | 1.01        |\n",
            "|    value_loss           | 50.2        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 261         |\n",
            "|    iterations           | 10          |\n",
            "|    time_elapsed         | 78          |\n",
            "|    total_timesteps      | 20480       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.023602227 |\n",
            "|    clip_fraction        | 0.237       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.6       |\n",
            "|    explained_variance   | 0.00122     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 20.6        |\n",
            "|    n_updates            | 90          |\n",
            "|    policy_gradient_loss | -0.0167     |\n",
            "|    reward               | 0.78456616  |\n",
            "|    std                  | 1.02        |\n",
            "|    value_loss           | 46.4        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 261         |\n",
            "|    iterations           | 11          |\n",
            "|    time_elapsed         | 86          |\n",
            "|    total_timesteps      | 22528       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.026628345 |\n",
            "|    clip_fraction        | 0.309       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.7       |\n",
            "|    explained_variance   | 0.00502     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 21.2        |\n",
            "|    n_updates            | 100         |\n",
            "|    policy_gradient_loss | -0.013      |\n",
            "|    reward               | 4.0240583   |\n",
            "|    std                  | 1.02        |\n",
            "|    value_loss           | 69.3        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 12          |\n",
            "|    time_elapsed         | 94          |\n",
            "|    total_timesteps      | 24576       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.021350745 |\n",
            "|    clip_fraction        | 0.213       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.7       |\n",
            "|    explained_variance   | -0.0193     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 9.15        |\n",
            "|    n_updates            | 110         |\n",
            "|    policy_gradient_loss | -0.0169     |\n",
            "|    reward               | -0.2795613  |\n",
            "|    std                  | 1.02        |\n",
            "|    value_loss           | 19.3        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 13          |\n",
            "|    time_elapsed         | 102         |\n",
            "|    total_timesteps      | 26624       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.029555509 |\n",
            "|    clip_fraction        | 0.315       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.8       |\n",
            "|    explained_variance   | -0.0286     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 39.8        |\n",
            "|    n_updates            | 120         |\n",
            "|    policy_gradient_loss | -0.0192     |\n",
            "|    reward               | 0.06544654  |\n",
            "|    std                  | 1.02        |\n",
            "|    value_loss           | 67          |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 14          |\n",
            "|    time_elapsed         | 110         |\n",
            "|    total_timesteps      | 28672       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.024018355 |\n",
            "|    clip_fraction        | 0.269       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -41.8       |\n",
            "|    explained_variance   | -0.0201     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 72.6        |\n",
            "|    n_updates            | 130         |\n",
            "|    policy_gradient_loss | -0.0177     |\n",
            "|    reward               | 0.39938167  |\n",
            "|    std                  | 1.02        |\n",
            "|    value_loss           | 116         |\n",
            "-----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 258        |\n",
            "|    iterations           | 15         |\n",
            "|    time_elapsed         | 119        |\n",
            "|    total_timesteps      | 30720      |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.03237196 |\n",
            "|    clip_fraction        | 0.3        |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -41.9      |\n",
            "|    explained_variance   | 0.000251   |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 15.3       |\n",
            "|    n_updates            | 140        |\n",
            "|    policy_gradient_loss | -0.0196    |\n",
            "|    reward               | 4.184968   |\n",
            "|    std                  | 1.03       |\n",
            "|    value_loss           | 23.8       |\n",
            "----------------------------------------\n",
            "------------------------------------------\n",
            "| time/                   |              |\n",
            "|    fps                  | 257          |\n",
            "|    iterations           | 16           |\n",
            "|    time_elapsed         | 127          |\n",
            "|    total_timesteps      | 32768        |\n",
            "| train/                  |              |\n",
            "|    approx_kl            | 0.025582202  |\n",
            "|    clip_fraction        | 0.236        |\n",
            "|    clip_range           | 0.2          |\n",
            "|    entropy_loss         | -41.9        |\n",
            "|    explained_variance   | 0.00314      |\n",
            "|    learning_rate        | 0.00025      |\n",
            "|    loss                 | 8.16         |\n",
            "|    n_updates            | 150          |\n",
            "|    policy_gradient_loss | -0.0126      |\n",
            "|    reward               | -0.038666666 |\n",
            "|    std                  | 1.03         |\n",
            "|    value_loss           | 50           |\n",
            "------------------------------------------\n",
            "day: 2892, episode: 50\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 4402319.13\n",
            "total_reward: 3402319.13\n",
            "total_cost: 310061.56\n",
            "total_trades: 78421\n",
            "Sharpe: 0.834\n",
            "=================================\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 257         |\n",
            "|    iterations           | 17          |\n",
            "|    time_elapsed         | 134         |\n",
            "|    total_timesteps      | 34816       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.024341801 |\n",
            "|    clip_fraction        | 0.257       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42         |\n",
            "|    explained_variance   | -0.00145    |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 35.8        |\n",
            "|    n_updates            | 160         |\n",
            "|    policy_gradient_loss | -0.0144     |\n",
            "|    reward               | 0.61966515  |\n",
            "|    std                  | 1.03        |\n",
            "|    value_loss           | 71          |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 18          |\n",
            "|    time_elapsed         | 142         |\n",
            "|    total_timesteps      | 36864       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.029438617 |\n",
            "|    clip_fraction        | 0.25        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42         |\n",
            "|    explained_variance   | -0.0133     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 24.8        |\n",
            "|    n_updates            | 170         |\n",
            "|    policy_gradient_loss | -0.0145     |\n",
            "|    reward               | -0.545224   |\n",
            "|    std                  | 1.03        |\n",
            "|    value_loss           | 78.3        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 19          |\n",
            "|    time_elapsed         | 150         |\n",
            "|    total_timesteps      | 38912       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025373695 |\n",
            "|    clip_fraction        | 0.222       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.1       |\n",
            "|    explained_variance   | -0.0339     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 6.23        |\n",
            "|    n_updates            | 180         |\n",
            "|    policy_gradient_loss | -0.016      |\n",
            "|    reward               | 0.40111333  |\n",
            "|    std                  | 1.03        |\n",
            "|    value_loss           | 16          |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 20          |\n",
            "|    time_elapsed         | 158         |\n",
            "|    total_timesteps      | 40960       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.030494865 |\n",
            "|    clip_fraction        | 0.242       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.1       |\n",
            "|    explained_variance   | 0.0116      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 10          |\n",
            "|    n_updates            | 190         |\n",
            "|    policy_gradient_loss | -0.0121     |\n",
            "|    reward               | -0.35586402 |\n",
            "|    std                  | 1.03        |\n",
            "|    value_loss           | 42          |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 21          |\n",
            "|    time_elapsed         | 166         |\n",
            "|    total_timesteps      | 43008       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.035696197 |\n",
            "|    clip_fraction        | 0.34        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.1       |\n",
            "|    explained_variance   | 0.0109      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 33.6        |\n",
            "|    n_updates            | 200         |\n",
            "|    policy_gradient_loss | -0.00356    |\n",
            "|    reward               | -10.722384  |\n",
            "|    std                  | 1.04        |\n",
            "|    value_loss           | 59.1        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 22          |\n",
            "|    time_elapsed         | 174         |\n",
            "|    total_timesteps      | 45056       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025370643 |\n",
            "|    clip_fraction        | 0.226       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.2       |\n",
            "|    explained_variance   | 0.00307     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 9.62        |\n",
            "|    n_updates            | 210         |\n",
            "|    policy_gradient_loss | -0.0147     |\n",
            "|    reward               | 3.6796777   |\n",
            "|    std                  | 1.04        |\n",
            "|    value_loss           | 24          |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 23          |\n",
            "|    time_elapsed         | 182         |\n",
            "|    total_timesteps      | 47104       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.024651708 |\n",
            "|    clip_fraction        | 0.249       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.2       |\n",
            "|    explained_variance   | 0.0217      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 11          |\n",
            "|    n_updates            | 220         |\n",
            "|    policy_gradient_loss | -0.0106     |\n",
            "|    reward               | 1.1396157   |\n",
            "|    std                  | 1.04        |\n",
            "|    value_loss           | 68.1        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 24          |\n",
            "|    time_elapsed         | 190         |\n",
            "|    total_timesteps      | 49152       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.019927632 |\n",
            "|    clip_fraction        | 0.212       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.2       |\n",
            "|    explained_variance   | -0.000347   |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 21.5        |\n",
            "|    n_updates            | 230         |\n",
            "|    policy_gradient_loss | -0.0139     |\n",
            "|    reward               | 4.7052965   |\n",
            "|    std                  | 1.04        |\n",
            "|    value_loss           | 77.2        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 25          |\n",
            "|    time_elapsed         | 198         |\n",
            "|    total_timesteps      | 51200       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025366351 |\n",
            "|    clip_fraction        | 0.252       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.3       |\n",
            "|    explained_variance   | -0.00108    |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 25.5        |\n",
            "|    n_updates            | 240         |\n",
            "|    policy_gradient_loss | -0.00781    |\n",
            "|    reward               | -1.1468652  |\n",
            "|    std                  | 1.04        |\n",
            "|    value_loss           | 50.4        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 26          |\n",
            "|    time_elapsed         | 205         |\n",
            "|    total_timesteps      | 53248       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025304053 |\n",
            "|    clip_fraction        | 0.266       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.4       |\n",
            "|    explained_variance   | 0.00155     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 8.08        |\n",
            "|    n_updates            | 250         |\n",
            "|    policy_gradient_loss | -0.0134     |\n",
            "|    reward               | 0.7457433   |\n",
            "|    std                  | 1.04        |\n",
            "|    value_loss           | 18.9        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 27          |\n",
            "|    time_elapsed         | 213         |\n",
            "|    total_timesteps      | 55296       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.018584128 |\n",
            "|    clip_fraction        | 0.202       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.4       |\n",
            "|    explained_variance   | 0.0232      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 40.8        |\n",
            "|    n_updates            | 260         |\n",
            "|    policy_gradient_loss | -0.0114     |\n",
            "|    reward               | -0.30097428 |\n",
            "|    std                  | 1.04        |\n",
            "|    value_loss           | 104         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 28          |\n",
            "|    time_elapsed         | 221         |\n",
            "|    total_timesteps      | 57344       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.028224822 |\n",
            "|    clip_fraction        | 0.25        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.4       |\n",
            "|    explained_variance   | 0.0121      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 44.8        |\n",
            "|    n_updates            | 270         |\n",
            "|    policy_gradient_loss | -0.00742    |\n",
            "|    reward               | -0.40086997 |\n",
            "|    std                  | 1.05        |\n",
            "|    value_loss           | 96.1        |\n",
            "-----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 258        |\n",
            "|    iterations           | 29         |\n",
            "|    time_elapsed         | 229        |\n",
            "|    total_timesteps      | 59392      |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.0316035  |\n",
            "|    clip_fraction        | 0.281      |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -42.5      |\n",
            "|    explained_variance   | 0.0386     |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 12.4       |\n",
            "|    n_updates            | 280        |\n",
            "|    policy_gradient_loss | -0.011     |\n",
            "|    reward               | -1.3378962 |\n",
            "|    std                  | 1.05       |\n",
            "|    value_loss           | 24.6       |\n",
            "----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 258        |\n",
            "|    iterations           | 30         |\n",
            "|    time_elapsed         | 237        |\n",
            "|    total_timesteps      | 61440      |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.02995362 |\n",
            "|    clip_fraction        | 0.211      |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -42.5      |\n",
            "|    explained_variance   | 0.0289     |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 31.6       |\n",
            "|    n_updates            | 290        |\n",
            "|    policy_gradient_loss | -0.0127    |\n",
            "|    reward               | 1.266873   |\n",
            "|    std                  | 1.05       |\n",
            "|    value_loss           | 84.5       |\n",
            "----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 31          |\n",
            "|    time_elapsed         | 245         |\n",
            "|    total_timesteps      | 63488       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.028202966 |\n",
            "|    clip_fraction        | 0.273       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.6       |\n",
            "|    explained_variance   | 0.0311      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 36          |\n",
            "|    n_updates            | 300         |\n",
            "|    policy_gradient_loss | -0.00526    |\n",
            "|    reward               | 3.944501    |\n",
            "|    std                  | 1.05        |\n",
            "|    value_loss           | 102         |\n",
            "-----------------------------------------\n",
            "day: 2892, episode: 60\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 4789693.87\n",
            "total_reward: 3789693.87\n",
            "total_cost: 273702.02\n",
            "total_trades: 76548\n",
            "Sharpe: 0.848\n",
            "=================================\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 32          |\n",
            "|    time_elapsed         | 253         |\n",
            "|    total_timesteps      | 65536       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.034777187 |\n",
            "|    clip_fraction        | 0.334       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.6       |\n",
            "|    explained_variance   | 0.00553     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 26.9        |\n",
            "|    n_updates            | 310         |\n",
            "|    policy_gradient_loss | -0.00939    |\n",
            "|    reward               | 0.31705695  |\n",
            "|    std                  | 1.05        |\n",
            "|    value_loss           | 44.9        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 33          |\n",
            "|    time_elapsed         | 261         |\n",
            "|    total_timesteps      | 67584       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.023330035 |\n",
            "|    clip_fraction        | 0.219       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.7       |\n",
            "|    explained_variance   | 0.0156      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 43.8        |\n",
            "|    n_updates            | 320         |\n",
            "|    policy_gradient_loss | -0.0149     |\n",
            "|    reward               | -0.3661035  |\n",
            "|    std                  | 1.06        |\n",
            "|    value_loss           | 85.3        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 34          |\n",
            "|    time_elapsed         | 269         |\n",
            "|    total_timesteps      | 69632       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.027804123 |\n",
            "|    clip_fraction        | 0.265       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.7       |\n",
            "|    explained_variance   | 0.0626      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 77.7        |\n",
            "|    n_updates            | 330         |\n",
            "|    policy_gradient_loss | -0.0105     |\n",
            "|    reward               | 1.2939492   |\n",
            "|    std                  | 1.06        |\n",
            "|    value_loss           | 140         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 35          |\n",
            "|    time_elapsed         | 276         |\n",
            "|    total_timesteps      | 71680       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.023427177 |\n",
            "|    clip_fraction        | 0.214       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.8       |\n",
            "|    explained_variance   | 0.0209      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 62.5        |\n",
            "|    n_updates            | 340         |\n",
            "|    policy_gradient_loss | -0.00304    |\n",
            "|    reward               | -1.0734715  |\n",
            "|    std                  | 1.06        |\n",
            "|    value_loss           | 86.3        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 36          |\n",
            "|    time_elapsed         | 284         |\n",
            "|    total_timesteps      | 73728       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.022760246 |\n",
            "|    clip_fraction        | 0.199       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.8       |\n",
            "|    explained_variance   | 0.117       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 9.04        |\n",
            "|    n_updates            | 350         |\n",
            "|    policy_gradient_loss | -0.00747    |\n",
            "|    reward               | -6.45271    |\n",
            "|    std                  | 1.06        |\n",
            "|    value_loss           | 19.5        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 37          |\n",
            "|    time_elapsed         | 292         |\n",
            "|    total_timesteps      | 75776       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025826197 |\n",
            "|    clip_fraction        | 0.249       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -42.9       |\n",
            "|    explained_variance   | 0.0758      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 60          |\n",
            "|    n_updates            | 360         |\n",
            "|    policy_gradient_loss | -0.0121     |\n",
            "|    reward               | -1.350588   |\n",
            "|    std                  | 1.06        |\n",
            "|    value_loss           | 88.6        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 38          |\n",
            "|    time_elapsed         | 300         |\n",
            "|    total_timesteps      | 77824       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.031001095 |\n",
            "|    clip_fraction        | 0.32        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43         |\n",
            "|    explained_variance   | 0.00157     |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 84.8        |\n",
            "|    n_updates            | 370         |\n",
            "|    policy_gradient_loss | -0.00227    |\n",
            "|    reward               | -11.030183  |\n",
            "|    std                  | 1.07        |\n",
            "|    value_loss           | 101         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 39          |\n",
            "|    time_elapsed         | 308         |\n",
            "|    total_timesteps      | 79872       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.029696299 |\n",
            "|    clip_fraction        | 0.34        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43         |\n",
            "|    explained_variance   | -0.00693    |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 16.8        |\n",
            "|    n_updates            | 380         |\n",
            "|    policy_gradient_loss | 0.00159     |\n",
            "|    reward               | -2.9684417  |\n",
            "|    std                  | 1.07        |\n",
            "|    value_loss           | 48.6        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 40          |\n",
            "|    time_elapsed         | 316         |\n",
            "|    total_timesteps      | 81920       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.026893115 |\n",
            "|    clip_fraction        | 0.266       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.1       |\n",
            "|    explained_variance   | 0.0368      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 36.5        |\n",
            "|    n_updates            | 390         |\n",
            "|    policy_gradient_loss | -0.0104     |\n",
            "|    reward               | -1.2478313  |\n",
            "|    std                  | 1.07        |\n",
            "|    value_loss           | 76.1        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 41          |\n",
            "|    time_elapsed         | 324         |\n",
            "|    total_timesteps      | 83968       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.022934677 |\n",
            "|    clip_fraction        | 0.224       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.1       |\n",
            "|    explained_variance   | 0.0187      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 61          |\n",
            "|    n_updates            | 400         |\n",
            "|    policy_gradient_loss | -0.00554    |\n",
            "|    reward               | 0.23241846  |\n",
            "|    std                  | 1.07        |\n",
            "|    value_loss           | 155         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 258         |\n",
            "|    iterations           | 42          |\n",
            "|    time_elapsed         | 332         |\n",
            "|    total_timesteps      | 86016       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.018318513 |\n",
            "|    clip_fraction        | 0.193       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.2       |\n",
            "|    explained_variance   | 0.0151      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 35.8        |\n",
            "|    n_updates            | 410         |\n",
            "|    policy_gradient_loss | -0.00601    |\n",
            "|    reward               | 0.7993551   |\n",
            "|    std                  | 1.07        |\n",
            "|    value_loss           | 74.2        |\n",
            "-----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 258        |\n",
            "|    iterations           | 43         |\n",
            "|    time_elapsed         | 340        |\n",
            "|    total_timesteps      | 88064      |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.02744143 |\n",
            "|    clip_fraction        | 0.273      |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -43.2      |\n",
            "|    explained_variance   | 0.0756     |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 12.6       |\n",
            "|    n_updates            | 420        |\n",
            "|    policy_gradient_loss | -0.00646   |\n",
            "|    reward               | -1.4263109 |\n",
            "|    std                  | 1.07       |\n",
            "|    value_loss           | 21.6       |\n",
            "----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 44          |\n",
            "|    time_elapsed         | 347         |\n",
            "|    total_timesteps      | 90112       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.020070804 |\n",
            "|    clip_fraction        | 0.175       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.2       |\n",
            "|    explained_variance   | 0.0358      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 57.9        |\n",
            "|    n_updates            | 430         |\n",
            "|    policy_gradient_loss | -0.0114     |\n",
            "|    reward               | 0.12298692  |\n",
            "|    std                  | 1.07        |\n",
            "|    value_loss           | 106         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 45          |\n",
            "|    time_elapsed         | 355         |\n",
            "|    total_timesteps      | 92160       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.026247777 |\n",
            "|    clip_fraction        | 0.212       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.2       |\n",
            "|    explained_variance   | 0.0587      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 52.3        |\n",
            "|    n_updates            | 440         |\n",
            "|    policy_gradient_loss | -0.0121     |\n",
            "|    reward               | -1.1159879  |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 123         |\n",
            "-----------------------------------------\n",
            "day: 2892, episode: 70\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 5164852.87\n",
            "total_reward: 4164852.87\n",
            "total_cost: 288378.77\n",
            "total_trades: 76602\n",
            "Sharpe: 0.977\n",
            "=================================\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 46          |\n",
            "|    time_elapsed         | 363         |\n",
            "|    total_timesteps      | 94208       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025694236 |\n",
            "|    clip_fraction        | 0.256       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.3       |\n",
            "|    explained_variance   | 0.0352      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 8.13        |\n",
            "|    n_updates            | 450         |\n",
            "|    policy_gradient_loss | -0.00587    |\n",
            "|    reward               | -7.0544744  |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 20.4        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 47          |\n",
            "|    time_elapsed         | 371         |\n",
            "|    total_timesteps      | 96256       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025015108 |\n",
            "|    clip_fraction        | 0.189       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.3       |\n",
            "|    explained_variance   | 0.0305      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 64.6        |\n",
            "|    n_updates            | 460         |\n",
            "|    policy_gradient_loss | -0.0137     |\n",
            "|    reward               | 1.194413    |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 69.2        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 48          |\n",
            "|    time_elapsed         | 379         |\n",
            "|    total_timesteps      | 98304       |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.023960821 |\n",
            "|    clip_fraction        | 0.211       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.3       |\n",
            "|    explained_variance   | 0.0592      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 114         |\n",
            "|    n_updates            | 470         |\n",
            "|    policy_gradient_loss | -0.00981    |\n",
            "|    reward               | 15.363016   |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 150         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 49          |\n",
            "|    time_elapsed         | 387         |\n",
            "|    total_timesteps      | 100352      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.028245423 |\n",
            "|    clip_fraction        | 0.278       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.3       |\n",
            "|    explained_variance   | -0.00187    |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 43.6        |\n",
            "|    n_updates            | 480         |\n",
            "|    policy_gradient_loss | 0.000369    |\n",
            "|    reward               | -2.748413   |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 92.5        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 50          |\n",
            "|    time_elapsed         | 395         |\n",
            "|    total_timesteps      | 102400      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.031188287 |\n",
            "|    clip_fraction        | 0.279       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.4       |\n",
            "|    explained_variance   | 0.0748      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 12.6        |\n",
            "|    n_updates            | 490         |\n",
            "|    policy_gradient_loss | -0.0139     |\n",
            "|    reward               | -0.5266687  |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 29.1        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 51          |\n",
            "|    time_elapsed         | 403         |\n",
            "|    total_timesteps      | 104448      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.035824828 |\n",
            "|    clip_fraction        | 0.327       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.4       |\n",
            "|    explained_variance   | 0.0209      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 71.5        |\n",
            "|    n_updates            | 500         |\n",
            "|    policy_gradient_loss | -0.00316    |\n",
            "|    reward               | 0.22485653  |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 168         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 52          |\n",
            "|    time_elapsed         | 410         |\n",
            "|    total_timesteps      | 106496      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.028914222 |\n",
            "|    clip_fraction        | 0.266       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.5       |\n",
            "|    explained_variance   | 0.0417      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 46          |\n",
            "|    n_updates            | 510         |\n",
            "|    policy_gradient_loss | -0.00469    |\n",
            "|    reward               | -2.832254   |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 143         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 53          |\n",
            "|    time_elapsed         | 418         |\n",
            "|    total_timesteps      | 108544      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.024428545 |\n",
            "|    clip_fraction        | 0.31        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.5       |\n",
            "|    explained_variance   | 0.238       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 10.2        |\n",
            "|    n_updates            | 520         |\n",
            "|    policy_gradient_loss | 0.00222     |\n",
            "|    reward               | 1.5100558   |\n",
            "|    std                  | 1.08        |\n",
            "|    value_loss           | 20.1        |\n",
            "-----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 259        |\n",
            "|    iterations           | 54         |\n",
            "|    time_elapsed         | 426        |\n",
            "|    total_timesteps      | 110592     |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.02579272 |\n",
            "|    clip_fraction        | 0.21       |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -43.5      |\n",
            "|    explained_variance   | 0.145      |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 73.8       |\n",
            "|    n_updates            | 530        |\n",
            "|    policy_gradient_loss | -0.00919   |\n",
            "|    reward               | 0.17646939 |\n",
            "|    std                  | 1.09       |\n",
            "|    value_loss           | 71.4       |\n",
            "----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 55          |\n",
            "|    time_elapsed         | 434         |\n",
            "|    total_timesteps      | 112640      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.029723924 |\n",
            "|    clip_fraction        | 0.303       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.6       |\n",
            "|    explained_variance   | 0.0773      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 47.2        |\n",
            "|    n_updates            | 540         |\n",
            "|    policy_gradient_loss | -0.000984   |\n",
            "|    reward               | 6.941355    |\n",
            "|    std                  | 1.09        |\n",
            "|    value_loss           | 100         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 56          |\n",
            "|    time_elapsed         | 442         |\n",
            "|    total_timesteps      | 114688      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.028071523 |\n",
            "|    clip_fraction        | 0.286       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.6       |\n",
            "|    explained_variance   | 0.0608      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 16.6        |\n",
            "|    n_updates            | 550         |\n",
            "|    policy_gradient_loss | -0.00183    |\n",
            "|    reward               | 1.3215259   |\n",
            "|    std                  | 1.09        |\n",
            "|    value_loss           | 46.4        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 57          |\n",
            "|    time_elapsed         | 450         |\n",
            "|    total_timesteps      | 116736      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.020538481 |\n",
            "|    clip_fraction        | 0.253       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.7       |\n",
            "|    explained_variance   | 0.127       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 49.7        |\n",
            "|    n_updates            | 560         |\n",
            "|    policy_gradient_loss | -0.0116     |\n",
            "|    reward               | -1.8079702  |\n",
            "|    std                  | 1.09        |\n",
            "|    value_loss           | 71.1        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 58          |\n",
            "|    time_elapsed         | 457         |\n",
            "|    total_timesteps      | 118784      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.031003634 |\n",
            "|    clip_fraction        | 0.302       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.7       |\n",
            "|    explained_variance   | 0.124       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 63.1        |\n",
            "|    n_updates            | 570         |\n",
            "|    policy_gradient_loss | -0.00229    |\n",
            "|    reward               | 0.97340846  |\n",
            "|    std                  | 1.09        |\n",
            "|    value_loss           | 120         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 59          |\n",
            "|    time_elapsed         | 465         |\n",
            "|    total_timesteps      | 120832      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.036076754 |\n",
            "|    clip_fraction        | 0.308       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.8       |\n",
            "|    explained_variance   | 0.0404      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 42.6        |\n",
            "|    n_updates            | 580         |\n",
            "|    policy_gradient_loss | 0.00353     |\n",
            "|    reward               | 2.0065172   |\n",
            "|    std                  | 1.1         |\n",
            "|    value_loss           | 103         |\n",
            "-----------------------------------------\n",
            "day: 2892, episode: 80\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 6113759.65\n",
            "total_reward: 5113759.65\n",
            "total_cost: 231470.88\n",
            "total_trades: 72914\n",
            "Sharpe: 0.931\n",
            "=================================\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 60          |\n",
            "|    time_elapsed         | 473         |\n",
            "|    total_timesteps      | 122880      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.019886103 |\n",
            "|    clip_fraction        | 0.19        |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.8       |\n",
            "|    explained_variance   | 0.288       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 8.22        |\n",
            "|    n_updates            | 590         |\n",
            "|    policy_gradient_loss | -0.00586    |\n",
            "|    reward               | 0.107104875 |\n",
            "|    std                  | 1.1         |\n",
            "|    value_loss           | 22.6        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 61          |\n",
            "|    time_elapsed         | 481         |\n",
            "|    total_timesteps      | 124928      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.017066857 |\n",
            "|    clip_fraction        | 0.2         |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.9       |\n",
            "|    explained_variance   | 0.112       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 49.7        |\n",
            "|    n_updates            | 600         |\n",
            "|    policy_gradient_loss | -0.00149    |\n",
            "|    reward               | 1.2281696   |\n",
            "|    std                  | 1.1         |\n",
            "|    value_loss           | 132         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 62          |\n",
            "|    time_elapsed         | 489         |\n",
            "|    total_timesteps      | 126976      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.016895013 |\n",
            "|    clip_fraction        | 0.139       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.9       |\n",
            "|    explained_variance   | 0.12        |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 55.8        |\n",
            "|    n_updates            | 610         |\n",
            "|    policy_gradient_loss | -0.00382    |\n",
            "|    reward               | 3.3853097   |\n",
            "|    std                  | 1.1         |\n",
            "|    value_loss           | 121         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 63          |\n",
            "|    time_elapsed         | 496         |\n",
            "|    total_timesteps      | 129024      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.033882424 |\n",
            "|    clip_fraction        | 0.305       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.9       |\n",
            "|    explained_variance   | 0.0419      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 17.2        |\n",
            "|    n_updates            | 620         |\n",
            "|    policy_gradient_loss | 0.00595     |\n",
            "|    reward               | 2.8069315   |\n",
            "|    std                  | 1.1         |\n",
            "|    value_loss           | 32.5        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 64          |\n",
            "|    time_elapsed         | 504         |\n",
            "|    total_timesteps      | 131072      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.020187281 |\n",
            "|    clip_fraction        | 0.166       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -43.9       |\n",
            "|    explained_variance   | 0.0544      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 27.8        |\n",
            "|    n_updates            | 630         |\n",
            "|    policy_gradient_loss | -0.00276    |\n",
            "|    reward               | -1.0588717  |\n",
            "|    std                  | 1.1         |\n",
            "|    value_loss           | 80.6        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 65          |\n",
            "|    time_elapsed         | 512         |\n",
            "|    total_timesteps      | 133120      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.026242675 |\n",
            "|    clip_fraction        | 0.152       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44         |\n",
            "|    explained_variance   | 0.198       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 53.2        |\n",
            "|    n_updates            | 640         |\n",
            "|    policy_gradient_loss | -0.00627    |\n",
            "|    reward               | -1.2091904  |\n",
            "|    std                  | 1.1         |\n",
            "|    value_loss           | 116         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 66          |\n",
            "|    time_elapsed         | 520         |\n",
            "|    total_timesteps      | 135168      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.014815284 |\n",
            "|    clip_fraction        | 0.153       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44         |\n",
            "|    explained_variance   | 0.0888      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 51.5        |\n",
            "|    n_updates            | 650         |\n",
            "|    policy_gradient_loss | -0.00839    |\n",
            "|    reward               | 1.9226373   |\n",
            "|    std                  | 1.11        |\n",
            "|    value_loss           | 98.7        |\n",
            "-----------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 259        |\n",
            "|    iterations           | 67         |\n",
            "|    time_elapsed         | 528        |\n",
            "|    total_timesteps      | 137216     |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.03140388 |\n",
            "|    clip_fraction        | 0.25       |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -44        |\n",
            "|    explained_variance   | 0.423      |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 12.9       |\n",
            "|    n_updates            | 660        |\n",
            "|    policy_gradient_loss | -0.00867   |\n",
            "|    reward               | -1.7564014 |\n",
            "|    std                  | 1.11       |\n",
            "|    value_loss           | 27.2       |\n",
            "----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 259         |\n",
            "|    iterations           | 68          |\n",
            "|    time_elapsed         | 535         |\n",
            "|    total_timesteps      | 139264      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.024263658 |\n",
            "|    clip_fraction        | 0.193       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.1       |\n",
            "|    explained_variance   | 0.135       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 55.8        |\n",
            "|    n_updates            | 670         |\n",
            "|    policy_gradient_loss | -0.00405    |\n",
            "|    reward               | -0.5820698  |\n",
            "|    std                  | 1.11        |\n",
            "|    value_loss           | 170         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 69          |\n",
            "|    time_elapsed         | 543         |\n",
            "|    total_timesteps      | 141312      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.015419064 |\n",
            "|    clip_fraction        | 0.127       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.1       |\n",
            "|    explained_variance   | 0.105       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 42.1        |\n",
            "|    n_updates            | 680         |\n",
            "|    policy_gradient_loss | -0.00695    |\n",
            "|    reward               | 4.467919    |\n",
            "|    std                  | 1.11        |\n",
            "|    value_loss           | 147         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 70          |\n",
            "|    time_elapsed         | 551         |\n",
            "|    total_timesteps      | 143360      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.024684295 |\n",
            "|    clip_fraction        | 0.231       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.1       |\n",
            "|    explained_variance   | 0.14        |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 24.4        |\n",
            "|    n_updates            | 690         |\n",
            "|    policy_gradient_loss | 9.73e-05    |\n",
            "|    reward               | 0.9176686   |\n",
            "|    std                  | 1.11        |\n",
            "|    value_loss           | 43.6        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 71          |\n",
            "|    time_elapsed         | 559         |\n",
            "|    total_timesteps      | 145408      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.040954344 |\n",
            "|    clip_fraction        | 0.293       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.2       |\n",
            "|    explained_variance   | 0.103       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 34.3        |\n",
            "|    n_updates            | 700         |\n",
            "|    policy_gradient_loss | -0.000853   |\n",
            "|    reward               | 0.41794342  |\n",
            "|    std                  | 1.11        |\n",
            "|    value_loss           | 134         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 72          |\n",
            "|    time_elapsed         | 566         |\n",
            "|    total_timesteps      | 147456      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.043037716 |\n",
            "|    clip_fraction        | 0.321       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.2       |\n",
            "|    explained_variance   | 0.174       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 91.4        |\n",
            "|    n_updates            | 710         |\n",
            "|    policy_gradient_loss | 0.000202    |\n",
            "|    reward               | -19.98116   |\n",
            "|    std                  | 1.11        |\n",
            "|    value_loss           | 83.1        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 73          |\n",
            "|    time_elapsed         | 574         |\n",
            "|    total_timesteps      | 149504      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.017187431 |\n",
            "|    clip_fraction        | 0.182       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.3       |\n",
            "|    explained_variance   | 0.0876      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 41.5        |\n",
            "|    n_updates            | 720         |\n",
            "|    policy_gradient_loss | -0.00969    |\n",
            "|    reward               | -3.8284647  |\n",
            "|    std                  | 1.12        |\n",
            "|    value_loss           | 79.3        |\n",
            "-----------------------------------------\n",
            "day: 2892, episode: 90\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 6458901.60\n",
            "total_reward: 5458901.60\n",
            "total_cost: 215837.23\n",
            "total_trades: 71313\n",
            "Sharpe: 0.886\n",
            "=================================\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 74          |\n",
            "|    time_elapsed         | 582         |\n",
            "|    total_timesteps      | 151552      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.026079465 |\n",
            "|    clip_fraction        | 0.241       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.3       |\n",
            "|    explained_variance   | 0.16        |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 27.4        |\n",
            "|    n_updates            | 730         |\n",
            "|    policy_gradient_loss | -0.00635    |\n",
            "|    reward               | -3.3378792  |\n",
            "|    std                  | 1.12        |\n",
            "|    value_loss           | 59.4        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 75          |\n",
            "|    time_elapsed         | 590         |\n",
            "|    total_timesteps      | 153600      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.025451917 |\n",
            "|    clip_fraction        | 0.222       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.4       |\n",
            "|    explained_variance   | 0.0644      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 65.2        |\n",
            "|    n_updates            | 740         |\n",
            "|    policy_gradient_loss | -0.00554    |\n",
            "|    reward               | 1.439964    |\n",
            "|    std                  | 1.12        |\n",
            "|    value_loss           | 158         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 76          |\n",
            "|    time_elapsed         | 597         |\n",
            "|    total_timesteps      | 155648      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.038097903 |\n",
            "|    clip_fraction        | 0.345       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.4       |\n",
            "|    explained_variance   | 0.114       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 39.8        |\n",
            "|    n_updates            | 750         |\n",
            "|    policy_gradient_loss | 0.00385     |\n",
            "|    reward               | -6.5932136  |\n",
            "|    std                  | 1.12        |\n",
            "|    value_loss           | 109         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 77          |\n",
            "|    time_elapsed         | 605         |\n",
            "|    total_timesteps      | 157696      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.029659446 |\n",
            "|    clip_fraction        | 0.302       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.5       |\n",
            "|    explained_variance   | 0.32        |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 21.7        |\n",
            "|    n_updates            | 760         |\n",
            "|    policy_gradient_loss | -0.00552    |\n",
            "|    reward               | 1.3719094   |\n",
            "|    std                  | 1.13        |\n",
            "|    value_loss           | 35.1        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 78          |\n",
            "|    time_elapsed         | 613         |\n",
            "|    total_timesteps      | 159744      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.026895307 |\n",
            "|    clip_fraction        | 0.235       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.6       |\n",
            "|    explained_variance   | 0.253       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 24.1        |\n",
            "|    n_updates            | 770         |\n",
            "|    policy_gradient_loss | -0.00313    |\n",
            "|    reward               | 1.7809488   |\n",
            "|    std                  | 1.13        |\n",
            "|    value_loss           | 99.4        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 79          |\n",
            "|    time_elapsed         | 621         |\n",
            "|    total_timesteps      | 161792      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.017895402 |\n",
            "|    clip_fraction        | 0.184       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.6       |\n",
            "|    explained_variance   | 0.155       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 50.5        |\n",
            "|    n_updates            | 780         |\n",
            "|    policy_gradient_loss | -0.00598    |\n",
            "|    reward               | 1.0942178   |\n",
            "|    std                  | 1.13        |\n",
            "|    value_loss           | 101         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 80          |\n",
            "|    time_elapsed         | 629         |\n",
            "|    total_timesteps      | 163840      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.029639244 |\n",
            "|    clip_fraction        | 0.278       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.7       |\n",
            "|    explained_variance   | 0.161       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 19.7        |\n",
            "|    n_updates            | 790         |\n",
            "|    policy_gradient_loss | -0.00817    |\n",
            "|    reward               | -0.17899549 |\n",
            "|    std                  | 1.13        |\n",
            "|    value_loss           | 45          |\n",
            "-----------------------------------------\n",
            "------------------------------------------\n",
            "| time/                   |              |\n",
            "|    fps                  | 260          |\n",
            "|    iterations           | 81           |\n",
            "|    time_elapsed         | 636          |\n",
            "|    total_timesteps      | 165888       |\n",
            "| train/                  |              |\n",
            "|    approx_kl            | 0.022830734  |\n",
            "|    clip_fraction        | 0.247        |\n",
            "|    clip_range           | 0.2          |\n",
            "|    entropy_loss         | -44.7        |\n",
            "|    explained_variance   | 0.131        |\n",
            "|    learning_rate        | 0.00025      |\n",
            "|    loss                 | 24.2         |\n",
            "|    n_updates            | 800          |\n",
            "|    policy_gradient_loss | -0.0132      |\n",
            "|    reward               | -0.094855145 |\n",
            "|    std                  | 1.13         |\n",
            "|    value_loss           | 55.3         |\n",
            "------------------------------------------\n",
            "----------------------------------------\n",
            "| time/                   |            |\n",
            "|    fps                  | 260        |\n",
            "|    iterations           | 82         |\n",
            "|    time_elapsed         | 644        |\n",
            "|    total_timesteps      | 167936     |\n",
            "| train/                  |            |\n",
            "|    approx_kl            | 0.02838034 |\n",
            "|    clip_fraction        | 0.269      |\n",
            "|    clip_range           | 0.2        |\n",
            "|    entropy_loss         | -44.8      |\n",
            "|    explained_variance   | 0.161      |\n",
            "|    learning_rate        | 0.00025    |\n",
            "|    loss                 | 21.3       |\n",
            "|    n_updates            | 810        |\n",
            "|    policy_gradient_loss | -0.00506   |\n",
            "|    reward               | 0.44646302 |\n",
            "|    std                  | 1.14       |\n",
            "|    value_loss           | 51.3       |\n",
            "----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 83          |\n",
            "|    time_elapsed         | 652         |\n",
            "|    total_timesteps      | 169984      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.031894207 |\n",
            "|    clip_fraction        | 0.332       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.9       |\n",
            "|    explained_variance   | 0.0617      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 29.2        |\n",
            "|    n_updates            | 820         |\n",
            "|    policy_gradient_loss | -0.00717    |\n",
            "|    reward               | -1.5274507  |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 68.9        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 84          |\n",
            "|    time_elapsed         | 660         |\n",
            "|    total_timesteps      | 172032      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.017578714 |\n",
            "|    clip_fraction        | 0.203       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.9       |\n",
            "|    explained_variance   | 0.362       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 6.25        |\n",
            "|    n_updates            | 830         |\n",
            "|    policy_gradient_loss | -0.00792    |\n",
            "|    reward               | 0.6099827   |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 19.5        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 85          |\n",
            "|    time_elapsed         | 668         |\n",
            "|    total_timesteps      | 174080      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.02873035  |\n",
            "|    clip_fraction        | 0.194       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -44.9       |\n",
            "|    explained_variance   | 0.125       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 21.1        |\n",
            "|    n_updates            | 840         |\n",
            "|    policy_gradient_loss | -0.00787    |\n",
            "|    reward               | -0.11589265 |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 73.2        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 86          |\n",
            "|    time_elapsed         | 675         |\n",
            "|    total_timesteps      | 176128      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.037067182 |\n",
            "|    clip_fraction        | 0.262       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45         |\n",
            "|    explained_variance   | 0.101       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 25.4        |\n",
            "|    n_updates            | 850         |\n",
            "|    policy_gradient_loss | -0.00597    |\n",
            "|    reward               | 2.3699293   |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 85.6        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 87          |\n",
            "|    time_elapsed         | 683         |\n",
            "|    total_timesteps      | 178176      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.027373867 |\n",
            "|    clip_fraction        | 0.262       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45         |\n",
            "|    explained_variance   | 0.0411      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 20.1        |\n",
            "|    n_updates            | 860         |\n",
            "|    policy_gradient_loss | -0.00214    |\n",
            "|    reward               | 1.5545934   |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 48.2        |\n",
            "-----------------------------------------\n",
            "day: 2892, episode: 100\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 5757166.25\n",
            "total_reward: 4757166.25\n",
            "total_cost: 233558.68\n",
            "total_trades: 72765\n",
            "Sharpe: 0.833\n",
            "=================================\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 88          |\n",
            "|    time_elapsed         | 691         |\n",
            "|    total_timesteps      | 180224      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.022608444 |\n",
            "|    clip_fraction        | 0.268       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45         |\n",
            "|    explained_variance   | 0.17        |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 17.1        |\n",
            "|    n_updates            | 870         |\n",
            "|    policy_gradient_loss | -0.0101     |\n",
            "|    reward               | -3.0007036  |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 89.6        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 89          |\n",
            "|    time_elapsed         | 699         |\n",
            "|    total_timesteps      | 182272      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.021231357 |\n",
            "|    clip_fraction        | 0.148       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45         |\n",
            "|    explained_variance   | 0.0935      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 82.9        |\n",
            "|    n_updates            | 880         |\n",
            "|    policy_gradient_loss | 0.000506    |\n",
            "|    reward               | -0.06830058 |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 165         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 90          |\n",
            "|    time_elapsed         | 706         |\n",
            "|    total_timesteps      | 184320      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.023269184 |\n",
            "|    clip_fraction        | 0.314       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45         |\n",
            "|    explained_variance   | 0.0455      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 49.9        |\n",
            "|    n_updates            | 890         |\n",
            "|    policy_gradient_loss | -0.00435    |\n",
            "|    reward               | -0.20916463 |\n",
            "|    std                  | 1.14        |\n",
            "|    value_loss           | 148         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 91          |\n",
            "|    time_elapsed         | 714         |\n",
            "|    total_timesteps      | 186368      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.041500464 |\n",
            "|    clip_fraction        | 0.257       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.1       |\n",
            "|    explained_variance   | 0.398       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 14.7        |\n",
            "|    n_updates            | 900         |\n",
            "|    policy_gradient_loss | -0.00705    |\n",
            "|    reward               | -0.3923409  |\n",
            "|    std                  | 1.15        |\n",
            "|    value_loss           | 31.7        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 92          |\n",
            "|    time_elapsed         | 722         |\n",
            "|    total_timesteps      | 188416      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.023837274 |\n",
            "|    clip_fraction        | 0.174       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.1       |\n",
            "|    explained_variance   | 0.184       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 63          |\n",
            "|    n_updates            | 910         |\n",
            "|    policy_gradient_loss | -0.00454    |\n",
            "|    reward               | -1.928996   |\n",
            "|    std                  | 1.15        |\n",
            "|    value_loss           | 194         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 93          |\n",
            "|    time_elapsed         | 730         |\n",
            "|    total_timesteps      | 190464      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.033360228 |\n",
            "|    clip_fraction        | 0.262       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.1       |\n",
            "|    explained_variance   | 0.131       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 57.6        |\n",
            "|    n_updates            | 920         |\n",
            "|    policy_gradient_loss | -0.00356    |\n",
            "|    reward               | -2.9505231  |\n",
            "|    std                  | 1.15        |\n",
            "|    value_loss           | 132         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 94          |\n",
            "|    time_elapsed         | 737         |\n",
            "|    total_timesteps      | 192512      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.048173904 |\n",
            "|    clip_fraction        | 0.321       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.1       |\n",
            "|    explained_variance   | 0.187       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 21.3        |\n",
            "|    n_updates            | 930         |\n",
            "|    policy_gradient_loss | -0.00191    |\n",
            "|    reward               | -0.603856   |\n",
            "|    std                  | 1.15        |\n",
            "|    value_loss           | 43.9        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 95          |\n",
            "|    time_elapsed         | 745         |\n",
            "|    total_timesteps      | 194560      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.033114955 |\n",
            "|    clip_fraction        | 0.315       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.2       |\n",
            "|    explained_variance   | 0.148       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 107         |\n",
            "|    n_updates            | 940         |\n",
            "|    policy_gradient_loss | -0.00892    |\n",
            "|    reward               | -0.8720667  |\n",
            "|    std                  | 1.15        |\n",
            "|    value_loss           | 172         |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 96          |\n",
            "|    time_elapsed         | 753         |\n",
            "|    total_timesteps      | 196608      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.04076962  |\n",
            "|    clip_fraction        | 0.342       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.3       |\n",
            "|    explained_variance   | 0.161       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 27.9        |\n",
            "|    n_updates            | 950         |\n",
            "|    policy_gradient_loss | -0.0149     |\n",
            "|    reward               | -0.16824886 |\n",
            "|    std                  | 1.16        |\n",
            "|    value_loss           | 69.8        |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 260         |\n",
            "|    iterations           | 97          |\n",
            "|    time_elapsed         | 761         |\n",
            "|    total_timesteps      | 198656      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.033305183 |\n",
            "|    clip_fraction        | 0.286       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.4       |\n",
            "|    explained_variance   | 0.126       |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 31.5        |\n",
            "|    n_updates            | 960         |\n",
            "|    policy_gradient_loss | 0.00523     |\n",
            "|    reward               | 0.28580785  |\n",
            "|    std                  | 1.16        |\n",
            "|    value_loss           | 50          |\n",
            "-----------------------------------------\n",
            "-----------------------------------------\n",
            "| time/                   |             |\n",
            "|    fps                  | 261         |\n",
            "|    iterations           | 98          |\n",
            "|    time_elapsed         | 768         |\n",
            "|    total_timesteps      | 200704      |\n",
            "| train/                  |             |\n",
            "|    approx_kl            | 0.044834472 |\n",
            "|    clip_fraction        | 0.308       |\n",
            "|    clip_range           | 0.2         |\n",
            "|    entropy_loss         | -45.4       |\n",
            "|    explained_variance   | 0.0542      |\n",
            "|    learning_rate        | 0.00025     |\n",
            "|    loss                 | 91.3        |\n",
            "|    n_updates            | 970         |\n",
            "|    policy_gradient_loss | -0.0103     |\n",
            "|    reward               | -0.12478354 |\n",
            "|    std                  | 1.16        |\n",
            "|    value_loss           | 65.7        |\n",
            "-----------------------------------------\n"
          ]
        }
      ],
      "source": [
        "trained_ppo = agent.train_model(model=model_ppo, \n",
        "                             tb_log_name='ppo',\n",
        "                             total_timesteps=200000) if if_using_ppo else None"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 62,
      "metadata": {
        "id": "C6AidlWyvwzm"
      },
      "outputs": [],
      "source": [
        "trained_ppo.save(TRAINED_MODEL_DIR + \"/agent_ppo\") if if_using_ppo else None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3Zpv4S0-fDBv"
      },
      "source": [
        "### Agent 4: TD3"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 63,
      "metadata": {
        "id": "JSAHhV4Xc-bh"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}\n",
            "Using cpu device\n",
            "Logging to results/td3\n"
          ]
        }
      ],
      "source": [
        "agent = DRLAgent(env = env_train)\n",
        "TD3_PARAMS = {\"batch_size\": 100, \n",
        "              \"buffer_size\": 1000000, \n",
        "              \"learning_rate\": 0.001}\n",
        "\n",
        "model_td3 = agent.get_model(\"td3\",model_kwargs = TD3_PARAMS)\n",
        "\n",
        "if if_using_td3:\n",
        "  # set up logger\n",
        "  tmp_path = RESULTS_DIR + '/td3'\n",
        "  new_logger_td3 = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
        "  # Set new logger\n",
        "  model_td3.set_logger(new_logger_td3)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 64,
      "metadata": {
        "id": "OSRxNYAxdKpU"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "day: 2892, episode: 110\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 4630528.14\n",
            "total_reward: 3630528.14\n",
            "total_cost: 999.00\n",
            "total_trades: 54948\n",
            "Sharpe: 0.799\n",
            "=================================\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 4         |\n",
            "|    fps             | 126       |\n",
            "|    time_elapsed    | 91        |\n",
            "|    total_timesteps | 11572     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | -20.3     |\n",
            "|    critic_loss     | 945       |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 8679      |\n",
            "|    reward          | 3.2279286 |\n",
            "----------------------------------\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 8         |\n",
            "|    fps             | 116       |\n",
            "|    time_elapsed    | 198       |\n",
            "|    total_timesteps | 23144     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 18.7      |\n",
            "|    critic_loss     | 41.5      |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 20251     |\n",
            "|    reward          | 3.2279286 |\n",
            "----------------------------------\n",
            "day: 2892, episode: 120\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 4630528.14\n",
            "total_reward: 3630528.14\n",
            "total_cost: 999.00\n",
            "total_trades: 54948\n",
            "Sharpe: 0.799\n",
            "=================================\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 12        |\n",
            "|    fps             | 113       |\n",
            "|    time_elapsed    | 304       |\n",
            "|    total_timesteps | 34716     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 21.5      |\n",
            "|    critic_loss     | 13.4      |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 31823     |\n",
            "|    reward          | 3.2279286 |\n",
            "----------------------------------\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 16        |\n",
            "|    fps             | 112       |\n",
            "|    time_elapsed    | 409       |\n",
            "|    total_timesteps | 46288     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 23.9      |\n",
            "|    critic_loss     | 9.85      |\n",
            "|    learning_rate   | 0.001     |\n",
            "|    n_updates       | 43395     |\n",
            "|    reward          | 3.2279286 |\n",
            "----------------------------------\n"
          ]
        }
      ],
      "source": [
        "trained_td3 = agent.train_model(model=model_td3, \n",
        "                             tb_log_name='td3',\n",
        "                             total_timesteps=50000) if if_using_td3 else None"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 65,
      "metadata": {
        "id": "OkJV6V_mv2hw"
      },
      "outputs": [],
      "source": [
        "trained_td3.save(TRAINED_MODEL_DIR + \"/agent_td3\") if if_using_td3 else None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Dr49PotrfG01"
      },
      "source": [
        "### Agent 5: SAC"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 66,
      "metadata": {
        "id": "xwOhVjqRkCdM"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}\n",
            "Using cpu device\n",
            "Logging to results/sac\n"
          ]
        }
      ],
      "source": [
        "agent = DRLAgent(env = env_train)\n",
        "SAC_PARAMS = {\n",
        "    \"batch_size\": 128,\n",
        "    \"buffer_size\": 100000,\n",
        "    \"learning_rate\": 0.0001,\n",
        "    \"learning_starts\": 100,\n",
        "    \"ent_coef\": \"auto_0.1\",\n",
        "}\n",
        "\n",
        "model_sac = agent.get_model(\"sac\",model_kwargs = SAC_PARAMS)\n",
        "\n",
        "if if_using_sac:\n",
        "  # set up logger\n",
        "  tmp_path = RESULTS_DIR + '/sac'\n",
        "  new_logger_sac = configure(tmp_path, [\"stdout\", \"csv\", \"tensorboard\"])\n",
        "  # Set new logger\n",
        "  model_sac.set_logger(new_logger_sac)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 67,
      "metadata": {
        "id": "K8RSdKCckJyH"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "day: 2892, episode: 130\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 8022184.63\n",
            "total_reward: 7022184.63\n",
            "total_cost: 9804.63\n",
            "total_trades: 48610\n",
            "Sharpe: 0.990\n",
            "=================================\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 4         |\n",
            "|    fps             | 87        |\n",
            "|    time_elapsed    | 132       |\n",
            "|    total_timesteps | 11572     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 1.67e+03  |\n",
            "|    critic_loss     | 327       |\n",
            "|    ent_coef        | 0.282     |\n",
            "|    ent_coef_loss   | 145       |\n",
            "|    learning_rate   | 0.0001    |\n",
            "|    n_updates       | 11471     |\n",
            "|    reward          | 3.9345849 |\n",
            "----------------------------------\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 8         |\n",
            "|    fps             | 86        |\n",
            "|    time_elapsed    | 266       |\n",
            "|    total_timesteps | 23144     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 1.05e+03  |\n",
            "|    critic_loss     | 191       |\n",
            "|    ent_coef        | 0.156     |\n",
            "|    ent_coef_loss   | -85.5     |\n",
            "|    learning_rate   | 0.0001    |\n",
            "|    n_updates       | 23043     |\n",
            "|    reward          | 2.4166472 |\n",
            "----------------------------------\n",
            "---------------------------------\n",
            "| time/              |          |\n",
            "|    episodes        | 12       |\n",
            "|    fps             | 86       |\n",
            "|    time_elapsed    | 399      |\n",
            "|    total_timesteps | 34716    |\n",
            "| train/             |          |\n",
            "|    actor_loss      | 531      |\n",
            "|    critic_loss     | 152      |\n",
            "|    ent_coef        | 0.0495   |\n",
            "|    ent_coef_loss   | -123     |\n",
            "|    learning_rate   | 0.0001   |\n",
            "|    n_updates       | 34615    |\n",
            "|    reward          | 4.92099  |\n",
            "---------------------------------\n",
            "day: 2892, episode: 140\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 3020750.09\n",
            "total_reward: 2020750.09\n",
            "total_cost: 14818.61\n",
            "total_trades: 50934\n",
            "Sharpe: 0.583\n",
            "=================================\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 16        |\n",
            "|    fps             | 87        |\n",
            "|    time_elapsed    | 531       |\n",
            "|    total_timesteps | 46288     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 282       |\n",
            "|    critic_loss     | 12.4      |\n",
            "|    ent_coef        | 0.0159    |\n",
            "|    ent_coef_loss   | -130      |\n",
            "|    learning_rate   | 0.0001    |\n",
            "|    n_updates       | 46187     |\n",
            "|    reward          | 4.4681573 |\n",
            "----------------------------------\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 20        |\n",
            "|    fps             | 87        |\n",
            "|    time_elapsed    | 664       |\n",
            "|    total_timesteps | 57860     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 160       |\n",
            "|    critic_loss     | 7.13      |\n",
            "|    ent_coef        | 0.00524   |\n",
            "|    ent_coef_loss   | -111      |\n",
            "|    learning_rate   | 0.0001    |\n",
            "|    n_updates       | 57759     |\n",
            "|    reward          | 4.1741095 |\n",
            "----------------------------------\n",
            "day: 2892, episode: 150\n",
            "begin_total_asset: 1000000.00\n",
            "end_total_asset: 3246048.79\n",
            "total_reward: 2246048.79\n",
            "total_cost: 1999.05\n",
            "total_trades: 48186\n",
            "Sharpe: 0.627\n",
            "=================================\n",
            "----------------------------------\n",
            "| time/              |           |\n",
            "|    episodes        | 24        |\n",
            "|    fps             | 87        |\n",
            "|    time_elapsed    | 795       |\n",
            "|    total_timesteps | 69432     |\n",
            "| train/             |           |\n",
            "|    actor_loss      | 84.1      |\n",
            "|    critic_loss     | 22.2      |\n",
            "|    ent_coef        | 0.00183   |\n",
            "|    ent_coef_loss   | -25.2     |\n",
            "|    learning_rate   | 0.0001    |\n",
            "|    n_updates       | 69331     |\n",
            "|    reward          | 3.8324556 |\n",
            "----------------------------------\n"
          ]
        }
      ],
      "source": [
        "trained_sac = agent.train_model(model=model_sac, \n",
        "                             tb_log_name='sac',\n",
        "                             total_timesteps=70000) if if_using_sac else None"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 68,
      "metadata": {
        "id": "_SpZoQgPv7GO"
      },
      "outputs": [],
      "source": [
        "trained_sac.save(TRAINED_MODEL_DIR + \"/agent_sac\") if if_using_sac else None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PgGm3dQZfRks"
      },
      "source": [
        "## Save the trained agent\n",
        "Trained agents should have already been saved in the \"trained_models\" drectory after you run the code blocks above.\n",
        "\n",
        "For Colab users, the zip files should be at \"./trained_models\" or \"/content/trained_models\".\n",
        "\n",
        "For users running on your local environment, the zip files should be at \"./trained_models\"."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [
        "MRiOtrywfAo1",
        "_gDkU-j-fCmZ",
        "3Zpv4S0-fDBv",
        "Dr49PotrfG01"
      ],
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.13"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
